ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data (2015)

Butyaev, A., Mavlyutov, R., Blanchette, M., Cudre-Mauroux, P., Waldispuhl, J.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-09-19

Description: Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data ( 3DBG ), and a 3D genome browser to visualize and explore 3D genome structures ( 3DGB ). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum (2015)

Chou, W.-C., Ma, Q., Yang, S., Cao, S., Klingeman, D. M., Brown, S. D., Xu, Y.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-29

Description: Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/ . We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

A thesaurus of genetic variation for interrogation of repetitive genomic regions (2015)

Kerzendorfer, C., Konopka, T., Nijman, S. M. B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-29

Description: Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

A new molecular signature method for prediction of driver cancer pathways from transcriptional data (2016)

Rykunov, D., Beckmann, N. D., Li, H., Uzilov, A., Schadt, E. E., Reva, B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Assigning cancer patients to the most effective treatments requires an understanding of the molecular basis of their disease. While DNA-based molecular profiling approaches have flourished over the past several years to transform our understanding of driver pathways across a broad range of tumors, a systematic characterization of key driver pathways based on RNA data has not been undertaken. Here we introduce a new approach for predicting the status of driver cancer pathways based on signature functions derived from RNA sequencing data. To identify the driver cancer pathways of interest, we mined DNA variant data from TCGA and nominated driver alterations in seven major cancer pathways in breast, ovarian and colon cancer tumors. The activation status of these driver pathways were then characterized using RNA sequencing data by constructing classification signature functions in training datasets and then testing the accuracy of the signatures in test datasets. The signature functions differentiate well tumors with nominated pathway activation from tumors with no signs of activation: average AUC equals to 0.83. Our results confirm that driver genomic alterations are distinctively displayed at the transcriptional level and that the transcriptional signatures can generally provide an alternative to DNA sequencing methods in detecting specific driver pathways.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences (2016)

Quang, D., Xie, X.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Phylogeny-aware identification and correction of taxonomically mislabeled sequences (2016)

Kozlov, A. M., Zhang, J., Yilmaz, P., Glöckner, F. O., Stamatakis, A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (‘mislabels’) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

iGEMS: an integrated model for identification of alternative exon usage events (2016)

Sood, S., Szkop, K. J., Nakhuda, A., Gallagher, I. J., Murie, C., Brogan, R. J., Kaprio, J., Kainulainen, H., Atherton, P. J., Kujala, U. M., Gustafsson, T., Larsson, O., Timmons, J. A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: DNA microarrays and RNAseq are complementary methods for studying RNA molecules. Current computational methods to determine alternative exon usage (AEU) using such data require impractical visual inspection and still yield high false-positive rates. Integrated Gene and Exon Model of Splicing (iGEMS) adapts a gene-level residuals model with a gene size adjusted false discovery rate and exon-level analysis to circumvent these limitations. iGEMS was applied to two new DNA microarray datasets, including the high coverage Human Transcriptome Arrays 2.0 and performance was validated using RT-qPCR. First, AEU was studied in adipocytes treated with ( n = 9) or without ( n = 8) the anti-diabetes drug, rosiglitazone. iGEMS identified 555 genes with AEU, and robust verification by RT-qPCR (~90%). Second, in a three-way human tissue comparison (muscle, adipose and blood, n = 41) iGEMS identified 4421 genes with at least one AEU event, with excellent RT-qPCR verification (95%, n = 22). Importantly, iGEMS identified a variety of AEU events, including 3'UTR extension, as well as exon inclusion/exclusion impacting on protein kinase and extracellular matrix domains. In conclusion, iGEMS is a robust method for identification of AEU while the variety of exon usage between human tissues is 5–10 times more prevalent than reported by the Genotype-Tissue Expression consortium using RNA sequencing.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Quantitative reconstruction of thermal and dynamic characteristics of lava flow from surface thermal measurements (2016)

Korotkii, A., Kovtunov, D., Ismail-Zadeh, A., Tsepelev, I., Melnik, O.

Oxford University Press

In: Geophysical Journal International

add to mindlist on the mindlist

Details

Publication Date: 2016-05-05

Description: We study a model of lava flow to determine its thermal and dynamic characteristics from thermal measurements of the lava at its surface. Mathematically this problem is reduced to solving an inverse boundary problem. Namely, using known conditions at one part of the model boundary we determine the missing condition at the remaining part of the boundary. We develop a numerical approach to the mathematical problem in the case of steady-state flow. Assuming that the temperature and the heat flow are prescribed at the upper surface of the model domain, we determine the flow characteristics in the entire model domain using a variational (adjoint) method. We have performed computations of model examples and showed that in the case of smooth input data the lava temperature and the flow velocity can be reconstructed with a high accuracy. As expected, a noise imposed on the smooth input data results in a less accurate solution, but still acceptable below some noise level. Also we analyse the influence of optimization methods on the solution convergence rate. The proposed method for reconstruction of physical parameters of lava flows can also be applied to other problems in geophysical fluid flows.

Keywords: Mineral Physics, Rheology, Heat Flow and Volcanology

Print ISSN: 0956-540X

Electronic ISSN: 1365-246X

Topics: Geosciences

Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Seismic-acoustic energy partitioning during a paroxysmal eruptive phase of Tungurahua volcano, Ecuador (2016)

Palacios, P. B., Diez, M., Kendall, J.-M., Mader, H. M.

Oxford University Press

In: Geophysical Journal International

add to mindlist on the mindlist

Details

Publication Date: 2016-05-06

Description: Studies of discrete volcanic explosions, that usually last less than 2 or 3 min, have suggested that the partitioning of seismic-acoustic energy is likely related to a range of physical mechanisms that depend on magma properties and other physical constraints such as the location of the fragmentation surface. In this paper, we explore the energy partition of a paroxysmal eruptive phase of Tungurahua volcano that lasted for over 4 hr, on 2006 July 14–15, using seismic-acoustic information recorded by stations on its flanks (near field). We find evidence of a linear scaling between seismic and acoustic energies, with time-dependent intensities, during the sustained explosive phase of the eruption. Furthermore, we argue that this scaling can be explained by two different processes: (1) the fragmentation region ultimately acts as the common source of energy producing both direct seismic waves, that travel through the volcanic edifice, and direct acoustic waves coming from a disturbed atmosphere above the summit; (2) the coupling of acoustic waves with the ground to cause seismic waves. Both processes are concurrent, however we have found that the first one is dominant for seismic records below 4 Hz. Here we use the linear scaling of intensities to construct seismic and acoustic indices, which, we argue, could be used to track an ongoing eruption. Thus, especially in strong paroxysms that can produce pyroclastic flows, the index correlation and their levels can be used as quantitative monitoring parameters to assess the volcanic hazard in real time. Additionally, we suggest from the linear scaling that the source type for both cases, seismic and acoustic, is dipolar and dominant in the near field.

Keywords: Mineral Physics, Rheology, Heat Flow and Volcanology

Print ISSN: 0956-540X

Electronic ISSN: 1365-246X

Topics: Geosciences

Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

TCGAbiolinks: an R/Bioconductor package for integrative analysis of TCGA data (2016)

Colaprico, A., Silva, T. C., Olsen, C., Garofano, L., Cava, C., Garolini, D., Sabedot, T. S., Malta, T. M., Pagnotta, S. M., Castiglioni, I., Ceccarelli, M., Bontempi, G., Noushmehr, H.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-05-06

Description: The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext