ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

A low-latency, big database system and browser for storage, querying and visualization of 3D genomic data (2015)

Butyaev, A., Mavlyutov, R., Blanchette, M., Cudre-Mauroux, P., Waldispuhl, J.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-09-19

Description: Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data ( 3DBG ), and a 3D genome browser to visualize and explore 3D genome structures ( 3DGB ). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Analysis of strand-specific RNA-seq data using machine learning reveals the structures of transcription units in Clostridium thermocellum (2015)

Chou, W.-C., Ma, Q., Yang, S., Cao, S., Klingeman, D. M., Brown, S. D., Xu, Y.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-29

Description: Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/ . We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

A thesaurus of genetic variation for interrogation of repetitive genomic regions (2015)

Kerzendorfer, C., Konopka, T., Nijman, S. M. B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2015-05-29

Description: Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Thinking beside the box: Should we care about the non-coding strand of the 16S rRNA gene? (2016)

Garcia-Mazcorro, J. F., Barcenas-Walls, J. R.

Oxford University Press

In: FEMS Microbiology Letters

add to mindlist on the mindlist

Details

Publication Date: 2016-07-31

Description: The 16S rRNA gene (16S rDNA) codes for RNA that plays a fundamental role during translation in the ribosome and is used extensively as a marker gene to establish relationships among bacteria. However, the complementary non-coding 16S rDNA (nc16S rDNA) has been ignored. An idea emerged in the course of analyzing bacterial 16S rDNA sequences in search for nucleotide composition and substitution patterns: Does the nc16S rDNA code? If so, what does it code for? More importantly: Does 16S rDNA evolution reflect its own evolution or the evolution of its counterpart nc16S rDNA? The objective of this minireview is to discuss these thoughts. nc strands often encode small RNAs (sRNAs), ancient components of gene regulation. nc16S rDNA sequences from different bacterial groups were used to search for possible matches in the Bacterial Small Regulatory RNA Database. Intriguingly, the sequence of one published sRNA obtained from Legionella pneumophila (GenBank: AE017354.1) showed high non-random similarity with nc16S rDNA corresponding in part to the V5 region especially from Legionella and relatives. While the target(s) of this sRNA is unclear at the moment, its mere existence might open up a new chapter in the use of the 16S rDNA to study relationships among bacteria.

Keywords: Physiology & Biochemistry

Print ISSN: 0378-1097

Electronic ISSN: 1574-6968

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Morphological and enzymatic response of the thermotolerant fungus Fomes sp. EUM1 in solid state fermentation under thermal stress (2016)

Ordaz-Hernandez, A., Ortega-Sanchez, E., Montesinos-Matias, R., Hernandez-Martinez, R., Torres-Martinez, D., Loera, O.

Oxford University Press

In: FEMS Microbiology Letters

add to mindlist on the mindlist

Details

Publication Date: 2016-08-05

Description: Thermotolerance of the fungus Fomes sp. EUM1 was evaluated in solid state fermentation (SSF). This thermotolerant strain improved both hyphal invasiveness (38%) and length (17%) in adverse thermal conditions exceeding 30°C and to a maximum of 40°C. In contrast, hyphal branching decreased by 46% at 45°C. The production of cellulases over corn stover increased 1.6-fold in 30°C culture conditions, xylanases increased 2.8-fold at 40°C, while laccase production improved 2.7-fold at 35°C. Maximum production of lignocellulolytic enzymes was obtained at elevated temperatures in shorter fermentation times (8–6 days), although the proteases appeared as a thermal stress response associated with a drop in lignocellulolytic activities. Novel and multiple isoenzymes of xylanase (four bands) and cellulase (six bands) were secreted in the range of 20–150 kDa during growth in adverse temperature conditions. However, only a single laccase isoenzyme (46 kDa) was detected. This is the first report describing the advantages of a thermotolerant white-rot fungus in SSF. These results have important implications for large-scale SSF, where effects of metabolic heat are detrimental to growth and enzyme production, which are severely affected by the formation of high temperature gradients.

Keywords: Physiology & Biochemistry

Print ISSN: 0378-1097

Electronic ISSN: 1574-6968

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

A new molecular signature method for prediction of driver cancer pathways from transcriptional data (2016)

Rykunov, D., Beckmann, N. D., Li, H., Uzilov, A., Schadt, E. E., Reva, B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Assigning cancer patients to the most effective treatments requires an understanding of the molecular basis of their disease. While DNA-based molecular profiling approaches have flourished over the past several years to transform our understanding of driver pathways across a broad range of tumors, a systematic characterization of key driver pathways based on RNA data has not been undertaken. Here we introduce a new approach for predicting the status of driver cancer pathways based on signature functions derived from RNA sequencing data. To identify the driver cancer pathways of interest, we mined DNA variant data from TCGA and nominated driver alterations in seven major cancer pathways in breast, ovarian and colon cancer tumors. The activation status of these driver pathways were then characterized using RNA sequencing data by constructing classification signature functions in training datasets and then testing the accuracy of the signatures in test datasets. The signature functions differentiate well tumors with nominated pathway activation from tumors with no signs of activation: average AUC equals to 0.83. Our results confirm that driver genomic alterations are distinctively displayed at the transcriptional level and that the transcriptional signatures can generally provide an alternative to DNA sequencing methods in detecting specific driver pathways.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Analysis of a new cluster of genes involved in the synthesis of the unique volatile organic compound sodorifen of Serratia plymuthica 4Rx13 (2016)

Domik, D., Magnus, N., Piechulla, B.

Oxford University Press

In: FEMS Microbiology Letters

add to mindlist on the mindlist

Details

Publication Date: 2016-06-23

Description: The rhizobacterium Serratia plymuthica 4Rx13 emits the novel and unique volatile sodorifen (C 16 H 26 ), which has a polymethylated bicyclic structure. Transcriptome analysis revealed that gene SOD_c20750 (annotated as terpene cyclase) is involved in the biosynthesis of sodorifen. Here we show that this gene is located in a small cluster of four genes ( SOD_c20750 – SOD_c20780 ), and the analysis of the knockout mutants demonstrated that SOD_c20760 (annotated as methyltransferase) and SOD_c20780 (annotated as isopentenyl pyrophosphate (IPP) isomerase) are needed for the biosynthesis of sodorifen, while a sodorifen-negative phenotype was not achieved with the SOD_c20770 (annotated as deoxy-xylulose-5-phosphate (DOXP) synthase) mutant. Altogether, the function of this new gene cluster was assigned to the biosynthesis of this structurally unusual volatile compound sodorifen.

Keywords: Physiology & Biochemistry

Print ISSN: 0378-1097

Electronic ISSN: 1574-6968

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

D-serine transporter in Staphylococcus saprophyticus identified (2016)

Marlinghaus, L., Huss, M., Korte-Berwanger, M., Sakinc-Güler, T., Gatermann, S. G.

Oxford University Press

In: FEMS Microbiology Letters

add to mindlist on the mindlist

Details

Publication Date: 2016-06-23

Description: Among staphylococci Staphylococcus saprophyticus is the only species that is typically uropathogenic and an important cause of urinary tract infections in young women. The amino acid D-serine occurs in relatively high concentrations in human urine and has a bacteriostatic or toxic effect on many bacteria. In uropathogenic Escherichia coli and S. saprophyticus , the amino acid regulates the expression of virulence factors and can be used as a nutrient. The ability of uropathogens to respond to or to metabolize D-serine has been suggested as a factor that enables colonization of the urinary tract. Until now nothing is known about D-serine transport in S. saprophyticus . We generated mutants of putative transporter genes in S. saprophyticus 7108 that show homology to the D-serine transporter cyc A of E. coli and tested them in a D-serine depletion assay to analyze the D-serine uptake rate of the cells. The mutant of SPP1070 showed a strong decrease in D-serine uptake. Therefore, SSP1070 was identified as a major D-serine transporter in S. saprophyticus 7108 and was named D-serine transporter A (DstA). D-serine caused a prolonged lag phase of S. saprophyticus in a chemically defined medium. This negative effect was dependent on the presence of DstA.

Keywords: Physiology & Biochemistry

Print ISSN: 0378-1097

Electronic ISSN: 1574-6968

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences (2016)

Quang, D., Xie, X.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Phylogeny-aware identification and correction of taxonomically mislabeled sequences (2016)

Kozlov, A. M., Zhang, J., Yilmaz, P., Glöckner, F. O., Stamatakis, A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2016-06-21

Description: Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (‘mislabels’) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext