ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

Description: Background Effective bioinformatics solutions are needed to tackle challenges posed by industrial-scale genome annotation. We present , a wrapper tool which predicts RNase P RNA genes by combining the speed of pattern matching and sensitivity of covariance models. The core of is a library of subfamily specific descriptor models and covariance models. Results Scanning all microbial genomes in GenBank identifies RNase P RNA genes in 98% of 1024 microbial chromosomal sequences within just 4 hours on single CPU. Comparing to existing annotations found in 387 of the GenBank files, predictions have more intact structure and are automatically classified by subfamily membership. For eukaryotic chromosomes could identify the known RNase P RNA genes in 84 out of 85 metazoan genomes and 19 out of 21 fungi genomes. predicted 37 novel eukaryotic RNase P RNA genes, 32 of which are from fungi. Gene duplication events are observed in at least 20 metazoan organisms. Scanning of meta-genomic data from the Global Ocean Sampling Expedition, comprising over 10 million sample sequences (18 Gigabases), predicted 2909 unique genes, 98% of which fall into ancestral bacteria A type of RNase P RNA and 66% of which have no close homolog to known prokaryotic RNase P RNA. Conclusions The combination of efficient filtering by means of a descriptor-based search and subsequent construction of a high-quality gene model by means of a covariance model provides an efficient method for the detection of RNase P RNA genes in large-scale sequencing data. is implemented as webserver and can also be downloaded for local use from http://rna.tbi.univie.ac.at/bcheck

Electronic ISSN: 1471-2164

Topics: Biology

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Detection of RNA structures in porcine EST data and related mammals (2007)

Seemann, Stefan E ; Gilchrist, Michael J ; Hofacker, Ivo L ; [et al.]

BioMed Central

In: BMC Genomics. 2007; 8(1): 316. Published 2007 Sep 10. doi: 10.1186/1471-2164-8-316.

add to mindlist on the mindlist

Details

Publication Date: 2007-09-10

Description: Background Non-coding RNAs (ncRNAs) are involved in a wide spectrum of regulatory functions. Within recent years, there have been increasing reports of observed polyadenylated ncRNAs and mRNA like ncRNAs in eukaryotes. To investigate this further, we examined the large data set in the Sino-Danish PigEST resource http://pigest.ku.dk which also contains expression information distributed on 97 non-normalized cDNA libraries. Results We constructed a pipeline, EST2ncRNA, to search for known and novel ncRNAs. The pipeline utilises sequence similarity to ncRNA databases (blast), structure similarity to Rfam (RaveNnA) as well as multiple alignments to predict conserved novel putative RNA structures (RNAz). EST2ncRNA was fed with 48,000 contigs and 73,000 singletons available from the PigEST resource. Using the pipeline we identified known RNA structures in 137 contigs and single reads (conreads), and predicted high confidence RNA structures in non-protein coding regions of additional 1,262 conreads. Of these, structures in 270 conreads overlap with existing predictions in human. To sum up, the PigEST resource comprises trans-acting elements (ncRNAs) in 715 contigs and 340 singletons as well as cis-acting elements (inside UTRs) in 311 contigs and 51 singletons, of which 18 conreads contain both predictions of trans- and cis-acting elements. The predicted RNAz candidates were compared with the PigEST expression information and we identify 114 contigs with an RNAz prediction and expression in at least ten of the non-normalised cDNA libraries. We conclude that the contigs with RNAz and known predictions are in general expressed at a much lower level than protein coding transcripts. In addition, we also observe that our ncRNA candidates constitute about one to two percent of the genes expressed in the cDNA libraries. Intriguingly, the cDNA libraries from developmental (brain) tissues contain the highest amount of ncRNA candidates, about two percent. These observations are related to existing knowledge and hypotheses about the role of ncRNAs in higher organisms. Furthermore, about 80% porcine coding transcripts (of 18,600 identified) as well as less than one-third ORF-free transcripts are conserved at least in the closely related bovine genome. Approximately one percent of the coding and 10% of the remaining matches are unique between the PigEST data and cow genome. Based on the pig-cow alignments, we searched for similarities to 16 other organisms by UCSC available alignments, which resulted in a 87% coverage by the human genome for instance. Conclusion Besides recovering several of the already annotated functional RNA structures, we predicted a large number of high confidence conserved secondary structures in polyadenylated porcine transcripts. Our observations of relatively low expression levels of predicted ncRNA candidates together with the observations of higher relative amount in cDNA libraries from developmental stages are in agreement with the current paradigm of ncRNA roles in higher organisms and supports the idea of polyadenylated ncRNAs.

Electronic ISSN: 1471-2164

Topics: Biology

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Multiple sequence alignments of partially coding nucleic acid sequences (2005)

Stocsits, Roman R ; Hofacker, Ivo L ; Fried, Claudia ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2005; 6(1): 160. Published 2005 Jun 28. doi: 10.1186/1471-2105-6-160.

add to mindlist on the mindlist

Details

Publication Date: 2005-06-28

Description: Background High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes. Results The standard scoring scheme for nucleic acid alignments can be extended to incorporate simultaneously information on translation products in one or more reading frames. Here we present a multiple alignment tool, codaln, that implements a combined nucleic acid plus amino acid scoring model for pairwise and progressive multiple alignments that allows arbitrary weighting for almost all scoring parameters. Resource requirements of codaln are comparable with those of standard tools such as ClustalW. Conclusion We demonstrate the applicability of codaln to various biologically relevant types of sequences (bacteriophage Levivirus and Vertebrate Hox clusters) and show that the combination of nucleic acid and amino acid sequence information leads to improved alignments. These, in turn, increase the performance of analysis tools that depend strictly on good input alignments such as methods for detecting conserved RNA secondary structure elements.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Algebraic comparison of metabolic networks, phylogenetic inference, and metabolic innovation (2006)

Forst, Christian V ; Flamm, Christoph ; Hofacker, Ivo L ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2006; 7(1): 67. Published 2006 Feb 14. doi: 10.1186/1471-2105-7-67.

add to mindlist on the mindlist

Details

Publication Date: 2006-02-14

Description: Background Comparison of metabolic networks is typically performed based on the organisms' enzyme contents. This approach disregards functional replacements as well as orthologies that are misannotated. Direct comparison of the structure of metabolic networks can circumvent these problems. Results Metabolic networks are naturally represented as directed hypergraphs in such a way that metabolites are nodes and enzyme-catalyzed reactions form (hyper)edges. The familiar operations from set algebra (union, intersection, and difference) form a natural basis for both the pairwise comparison of networks and identification of distinct metabolic features of a set of algorithms. We report here on an implementation of this approach and its application to the procaryotes. Conclusion We demonstrate that metabolic networks contain valuable phylogenetic information by comparing phylogenies obtained from network comparisons with 16S RNA phylogenies. The algebraic approach to metabolic networks is suitable to study metabolic innovations in two sets of organisms, free living microbes and Pyrococci, as well as obligate intracellular pathogens.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

The expansion of the metazoan microRNA repertoire (2006)

Hertel, Jana ; Lindemeyer, Manuela ; Missal, Kristin ; [et al.]

BioMed Central

In: BMC Genomics. 2006; 7(1): 25. Published 2006 Feb 15. doi: 10.1186/1471-2164-7-25.

add to mindlist on the mindlist

Details

Publication Date: 2006-02-15

Description: Background MicroRNAs have been identified as crucial regulators in both animals and plants. Here we report on a comprehensive comparative study of all known miRNA families in animals. We expand the MicroRNA Registry 6.0 by more than 1000 new homologs of miRNA precursors whose expression has been verified in at least one species. Using this uniform data basis we analyze their evolutionary history in terms of individual gene phylogenies and in terms of preservation of genomic nearness across species. This allows us to reliably identify microRNA clusters that are derived from a common transcript. Results We identify three episodes of microRNA innovation that correspond to major developmental innovations: A class of about 20 miRNAs is common to protostomes and deuterostomes and might be related to the advent of bilaterians. A second large wave of innovations maps to the branch leading to the vertebrates. The third significant outburst of miRNA innovation coincides with placental (eutherian) mammals. In addition, we observe the expected expansion of the microRNA inventory due to genome duplications in early vertebrates and in an ancestral teleost. The non-local duplications in the vertebrate ancestor are predated by local (tandem) duplications leading to the formation of about a dozen ancient microRNA clusters. Conclusion Our results suggest that microRNA innovation is an ongoing process. Major expansions of the metazoan miRNA repertoire coincide with the advent of bilaterians, vertebrates, and (placental) mammals.

Electronic ISSN: 1471-2164

Topics: Biology

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Strategies for measuring evolutionary conservation of RNA secondary structures (2008)

Gruber, Andreas R ; Bernhart, Stephan H ; Hofacker, Ivo L ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2008; 9(1): 122. Published 2008 Feb 26. doi: 10.1186/1471-2105-9-122.

add to mindlist on the mindlist

Details

Publication Date: 2008-02-26

Description: Background Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential. Results We systematically assessed the ability of various measures to detect conserved RNA structures in multiple sequence alignments. We tested three existing and eight novel strategies that are based on metrics of folding energies, metrics of single optimal structure predictions, and metrics of structure ensembles. We find that the folding energy based SCI score used in the RNAz program and a simple base-pair distance metric are by far the most accurate. The use of more complex metrics like for example tree editing does not improve performance. A variant of the SCI performed particularly well on highly conserved alignments and is thus a viable alternative when only little evolutionary information is available. Surprisingly, ensemble based methods that, in principle, could benefit from the additional information contained in sub-optimal structures, perform particularly poorly. As a general trend, we observed that methods that include a consensus structure prediction outperformed equivalent methods that only consider pairwise comparisons. Conclusion Structural conservation can be measured accurately with relatively simple and intuitive metrics. They have the potential to form the basis of future RNA gene finders, that face new challenges like finding lineage specific structures or detecting mis-aligned sequences.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext