ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

High speed BLASTN: an accelerated MegaBLAST search tool (2015)

Chen, Y., Ye, W., Zhang, Y., Xu, Y.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-09-19

Description: Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST—the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.

Keywords: Computational Methods

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data (2015)

Butyaev, A., Mavlyutov, R., Blanchette, M., Cudre-Mauroux, P., Waldispuhl, J.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-09-19

Description: Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data ( 3DBG ), and a 3D genome browser to visualize and explore 3D genome structures ( 3DGB ). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

3dRNAscore: a distance and torsion angle dependent evaluation function of 3D RNA structures (2015)

Wang, J., Zhao, Y., Zhu, C., Xiao, Y.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-29

Description: Model evaluation is a necessary step for better prediction and design of 3D RNA structures. For proteins, this has been widely studied and the knowledge-based statistical potential has been proved to be one of effective ways to solve this problem. Currently, a few knowledge-based statistical potentials have also been proposed to evaluate predicted models of RNA tertiary structures. The benchmark tests showed that they can identify the native structures effectively but further improvements are needed to identify near-native structures and those with non-canonical base pairs. Here, we present a novel knowledge-based potential, 3dRNAscore, which combines distance-dependent and dihedral-dependent energies. The benchmarks on different testing datasets all show that 3dRNAscore are more efficient than existing evaluation methods in recognizing native state from a pool of near-native states of RNAs as well as in ranking near-native states of RNA models.

Keywords: Computational Methods

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum (2015)

Chou, W.-C., Ma, Q., Yang, S., Cao, S., Klingeman, D. M., Brown, S. D., Xu, Y.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-29

Description: Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/ . We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

A thesaurus of genetic variation for interrogation of repetitive genomic regions (2015)

Kerzendorfer, C., Konopka, T., Nijman, S. M. B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-29

Description: Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

A new molecular signature method for prediction of driver cancer pathways from transcriptional data (2016)

Rykunov, D., Beckmann, N. D., Li, H., Uzilov, A., Schadt, E. E., Reva, B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Assigning cancer patients to the most effective treatments requires an understanding of the molecular basis of their disease. While DNA-based molecular profiling approaches have flourished over the past several years to transform our understanding of driver pathways across a broad range of tumors, a systematic characterization of key driver pathways based on RNA data has not been undertaken. Here we introduce a new approach for predicting the status of driver cancer pathways based on signature functions derived from RNA sequencing data. To identify the driver cancer pathways of interest, we mined DNA variant data from TCGA and nominated driver alterations in seven major cancer pathways in breast, ovarian and colon cancer tumors. The activation status of these driver pathways were then characterized using RNA sequencing data by constructing classification signature functions in training datasets and then testing the accuracy of the signatures in test datasets. The signature functions differentiate well tumors with nominated pathway activation from tumors with no signs of activation: average AUC equals to 0.83. Our results confirm that driver genomic alterations are distinctively displayed at the transcriptional level and that the transcriptional signatures can generally provide an alternative to DNA sequencing methods in detecting specific driver pathways.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

SPATIAL: A System-level PAThway Impact AnaLysis approach (2016)

Bokanizad, B., Tagett, R., Ansari, S., Helmi, B. H., Draghici, S.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: The goal of pathway analysis is to identify the pathways that are significantly impacted when a biological system is perturbed, e.g. by a disease or drug. Current methods treat pathways as independent entities. However, many signals are constantly sent from one pathway to another, essentially linking all pathways into a global, system-wide complex. In this work, we propose a set of three pathway analysis methods based on the impact analysis, that performs a system-level analysis by considering all signals between pathways, as well as their overlaps. Briefly, the global system is modeled in two ways: (i) considering the inter-pathway interaction exchange for each individual pathways, and (ii) combining all individual pathways to form a global, system-wide graph. The third analysis method is a hybrid of these two models. The new methods were compared with DAVID, GSEA, GSA, PathNet, Crosstalk and SPIA on 23 GEO data sets involving 19 tissues investigated in 12 conditions. The results show that both the ranking and the P -values of the target pathways are substantially improved when the analysis considers the system-wide dependencies and interactions between pathways.

Keywords: Computational Methods

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences (2016)

Quang, D., Xie, X.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Phylogeny-aware identification and correction of taxonomically mislabeled sequences (2016)

Kozlov, A. M., Zhang, J., Yilmaz, P., Glöckner, F. O., Stamatakis, A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (‘mislabels’) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

iGEMS: an integrated model for identification of alternative exon usage events (2016)

Sood, S., Szkop, K. J., Nakhuda, A., Gallagher, I. J., Murie, C., Brogan, R. J., Kaprio, J., Kainulainen, H., Atherton, P. J., Kujala, U. M., Gustafsson, T., Larsson, O., Timmons, J. A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: DNA microarrays and RNAseq are complementary methods for studying RNA molecules. Current computational methods to determine alternative exon usage (AEU) using such data require impractical visual inspection and still yield high false-positive rates. Integrated Gene and Exon Model of Splicing (iGEMS) adapts a gene-level residuals model with a gene size adjusted false discovery rate and exon-level analysis to circumvent these limitations. iGEMS was applied to two new DNA microarray datasets, including the high coverage Human Transcriptome Arrays 2.0 and performance was validated using RT-qPCR. First, AEU was studied in adipocytes treated with ( n = 9) or without ( n = 8) the anti-diabetes drug, rosiglitazone. iGEMS identified 555 genes with AEU, and robust verification by RT-qPCR (~90%). Second, in a three-way human tissue comparison (muscle, adipose and blood, n = 41) iGEMS identified 4421 genes with at least one AEU event, with excellent RT-qPCR verification (95%, n = 22). Importantly, iGEMS identified a variety of AEU events, including 3'UTR extension, as well as exon inclusion/exclusion impacting on protein kinase and extracellular matrix domains. In conclusion, iGEMS is a robust method for identification of AEU while the variety of exon usage between human tissues is 5–10 times more prevalent than reported by the Genotype-Tissue Expression consortium using RNA sequencing.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext