ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data (2016)

Colaprico, A., Silva, T. C., Olsen, C., Garofano, L., Cava, C., Garolini, D., Sabedot, T. S., Malta, T. M., Pagnotta, S. M., Castiglioni, I., Ceccarelli, M., Bontempi, G., Noushmehr, H.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-05-06

Description: The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses (2015)

Liu, R., Holik, A. Z., Su, S., Jansz, N., Chen, K., Leong, H. S., Blewitt, M. E., Asselin-Labat, M.-L., Smyth, G. K., Ritchie, M. E.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-08-29

Description: Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Mixed Integer Linear Programming based machine learning approach identifies regulators of telomerase in yeast (2016)

Poos, A. M., Maicher, A., Dieckmann, A. K., Oswald, M., Eils, R., Kupiec, M., Luke, B., König, R.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-03

Description: Understanding telomere length maintenance mechanisms is central in cancer biology as their dysregulation is one of the hallmarks for immortalization of cancer cells. Important for this well-balanced control is the transcriptional regulation of the telomerase genes. We integrated Mixed Integer Linear Programming models into a comparative machine learning based approach to identify regulatory interactions that best explain the discrepancy of telomerase transcript levels in yeast mutants with deleted regulators showing aberrant telomere length, when compared to mutants with normal telomere length. We uncover novel regulators of telomerase expression, several of which affect histone levels or modifications. In particular, our results point to the transcription factors Sum1, Hst1 and Srb2 as being important for the regulation of EST1 transcription, and we validated the effect of Sum1 experimentally. We compiled our machine learning method leading to a user friendly package for R which can straightforwardly be applied to similar problems integrating gene regulator binding information and expression profiles of samples of e.g. different phenotypes, diseases or treatments.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Gene integrated set profile analysis: a context-based approach for inferring biological endpoints (2016)

Kowalski, J., Dwivedi, B., Newman, S., Switchenko, J. M., Pauly, R., Gutman, D. A., Arora, J., Gandhi, K., Ainslie, K., Doho, G., Qin, Z., Moreno, C. S., Rossi, M. R., Vertino, P. M., Lonial, S., Bernal-Mizrachi, L., Boise, L. H.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-04-21

Description: The identification of genes with specific patterns of change (e.g. down-regulated and methylated) as phenotype drivers or samples with similar profiles for a given gene set as drivers of clinical outcome, requires the integration of several genomic data types for which an ‘integrate by intersection’ (IBI) approach is often applied. In this approach, results from separate analyses of each data type are intersected, which has the limitation of a smaller intersection with more data types. We introduce a new method, GISPA (Gene Integrated Set Profile Analysis) for integrated genomic analysis and its variation, SISPA (Sample Integrated Set Profile Analysis) for defining respective genes and samples with the context of similar, a priori specified molecular profiles. With GISPA, the user defines a molecular profile that is compared among several classes and obtains ranked gene sets that satisfy the profile as drivers of each class. With SISPA, the user defines a gene set that satisfies a profile and obtains sample groups of profile activity. Our results from applying GISPA to human multiple myeloma (MM) cell lines contained genes of known profiles and importance, along with several novel targets, and their further SISPA application to MM coMMpass trial data showed clinical relevance.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

RNA2DNAlign: nucleotide resolution allele asymmetries through quantitative assessment of RNA and DNA paired sequencing data (2016)

Movassagh, M., Alomran, N., Mudvari, P., Dede, M., Dede, C., Kowsari, K., Restrepo, P., Cauley, E., Bahl, S., Li, M., Waterhouse, W., Tsaneva-Atanasova, K., Edwards, N., Horvath, A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-12-17

Description: We introduce RNA2DNAlign, a computational framework for quantitative assessment of allele counts across paired RNA and DNA sequencing datasets. RNA2DNAlign is based on quantitation of the relative abundance of variant and reference read counts, followed by binomial tests for genotype and allelic status at SNV positions between compatible sequences. RNA2DNAlign detects positions with differential allele distribution, suggesting asymmetries due to regulatory/structural events. Based on the type of asymmetry, RNA2DNAlign outlines positions likely to be implicated in RNA editing, allele-specific expression or loss, somatic mutagenesis or loss-of-heterozygosity (the first three also in a tumor-specific setting). We applied RNA2DNAlign on 360 matching normal and tumor exomes and transcriptomes from 90 breast cancer patients from TCGA. Under high-confidence settings, RNA2DNAlign identified 2038 distinct SNV sites associated with one of the aforementioned asymetries, the majority of which have not been linked to functionality before. The performance assessment shows very high specificity and sensitivity, due to the corroboration of signals across multiple matching datasets. RNA2DNAlign is freely available from http://github.com/HorvathLab/NGS as a self-contained binary package for 64-bit Linux systems.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Graph-based modeling of tandem repeats improves global multiple sequence alignment (2013)

Szalkowski, A. M., Anisimova, M.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-09-26

Description: Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism (2015)

Blatti, C., Kazemian, M., Wolfe, S., Brodsky, M., Sinha, S.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-03

Description: Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF–DNA binding specificities (‘motifs’). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF–DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ~200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus (2014)

Prestat, E., David, M. M., Hultman, J., Ta F;, N., Lamendella, R., Dvornik, J., Mackelprang, R., Myrold, D. D., Jumpponen, A., Tringe, S. G., Holman, E., Mavromatis, K., Jansson, J. K.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2014-11-07

Description: A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associated functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Following Up on Smallholder Farmers and Supermarkets in Kenya (2015)

Andersson, C. I. M., Chege, C. G. K., Rao, E. J. O., Qaim, M.

Oxford University Press

In: American Journal of Agricultural Economics

add to mindlist on the mindlist

Details

Publication Date: 2015-07-10

Description: In many developing countries, supermarkets are expanding rapidly. This affects farmers’ marketing options. Previous studies have analyzed welfare effects of smallholder participation in supermarket channels from a static perspective, using cross-section data. We develop a conceptual framework and use panel data to better understand participation and impact dynamics. The analysis focuses on vegetable producers in Kenya. Participation in supermarket channels is associated with income gains. However, many farmers have dropped out of the supermarket channel due to various constraints. The initial income gains cannot be sustained when returning to the traditional market. Organizational support may be needed to avoid widening income disparities.

Keywords: L24 - Contracting Out ; Joint Ventures ; Technology Licensing, O12 - Microeconomic Analyses of Economic Development, O13 - Agriculture ; Natural Resources ; Energy ; Environment ; Other Primary Products, Q12 - Micro Analysis of Farm Firms, Farm Households, and Farm Input Markets, Q13 - Agricultural Markets and Marketing ; Cooperatives ; Agribusiness, Q18 - Agricultural Policy ; Food Policy

Print ISSN: 0002-9092

Electronic ISSN: 1467-8276

Topics: Agriculture, Forestry, Horticulture, Fishery, Domestic Science, Nutrition , Economics

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

A genome-wide approach for detecting novel insertion-deletion variants of mid-range size (2016)

Xia, L. C., Sakshuwong, S., Hopmans, E. S., Bell, J. M., Grimes, S. M., Siegmund, D. O., Ji, H. P., Zhang, N. R.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-09-03

Description: We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (〈10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext