ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • Bücher
  • Artikel  (1.590)
  • Forschungsdaten
  • Oxford University Press  (1.590)
  • Macmillan Magazines Ltd.
  • Springer
  • 2010-2014  (1.405)
  • 2005-2009
  • 1995-1999  (185)
  • 1980-1984
  • 1975-1979
  • 1935-1939
  • 1925-1929
  • 2012  (1.405)
  • 1997  (91)
  • 1995  (94)
  • 1984
  • 1982
  • 1981
  • 1979
  • 1978
  • 1977
  • 1938
  • 1928
  • Bioinformatics  (682)
  • 2184
  • Medizin  (1.590)
  • Chemie und Pharmazie
Sammlung
  • Bücher
  • Artikel  (1.590)
  • Forschungsdaten
Verlag/Herausgeber
  • Oxford University Press  (1.590)
  • Macmillan Magazines Ltd.
  • Springer
Erscheinungszeitraum
  • 2010-2014  (1.405)
  • 2005-2009
  • 1995-1999  (185)
  • 1980-1984
  • 1975-1979
  • +
Jahr
Thema
  • 1
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: High-throughput sequencing (HTS) technologies have made low-cost sequencing of large numbers of samples commonplace. An explosion in the type, not just number, of sequencing experiments has also taken place including genome re-sequencing, population-scale variation detection, whole transcriptome sequencing and genome-wide analysis of protein-bound nucleic acids. Results: We present Artemis as a tool for integrated visualization and computational analysis of different types of HTS datasets in the context of a reference genome and its corresponding annotation. Availability: Artemis is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute websites: http://www.sanger.ac.uk/resources/software/artemis/ . Contact: artemis@sanger.ac.uk ; tjc@sanger.ac.uk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Publikationsdatum: 2012-02-17
    Beschreibung: : microRibonucleic acid (miRNAs) are small regulatory molecules that act by mRNA degradation or via translational repression. Although many miRNAs are ubiquitously expressed, a small subset have differential expression patterns that may give rise to tissue-specific complexes. Motivation: This work studies gene targeting patterns amongst miRNAs with differential expression profiles, and links this to control and regulation of protein complexes. Results: We find that, when a pair of miRNAs are not expressed in the same tissues, there is a higher tendency for them to target the direct partners of the same hub proteins. At the same time, they also avoid targeting the same set of hub-spokes. Moreover, the complexes corresponding to these hub-spokes tend to be specific and nonoverlapping. This suggests that the effect of miRNAs on the formation of complexes is specific. Contact: wongls@comp.nus.edu.sg Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: Small interfering RNAs (siRNAs) are produced from much longer sequences of double-stranded RNA precursors through cleavage by Dicer or a Dicer-like protein. These small RNAs play a key role in genetic and epigenetic regulation; however, a full understanding of the mechanisms by which they operate depends on the characterization of the precursors from which they are derived. Results: High-throughput sequencing of small RNA populations allows the locations of the double-stranded RNA precursors to be inferred. We have developed methods to analyse small RNA sequencing data from multiple biological sources, taking into account replicate information, to identify robust sets of siRNA precursors. Our methods show good performance on both a set of small RNA sequencing data in Arabidopsis thaliana and simulated datasets. Availability: Our methods are available as the Bioconductor ( www.bioconductor.org ) package segmentSeq (version 1.5.6 and above). Contact: tjh48@cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: Intrinsically disordered regions are key for the function of numerous proteins, and the scant available experimental annotations suggest the existence of different disorder flavors. While efficient predictions are required to annotate entire genomes, most existing methods require sequence profiles for disorder prediction, making them cumbersome for high-throughput applications. Results: In this work, we present an ensemble of protein disorder predictors called ESpritz. These are based on bidirectional recursive neural networks and trained on three different flavors of disorder, including a novel NMR flexibility predictor. ESpritz can produce fast and accurate sequence-only predictions, annotating entire genomes in the order of hours on a single processor core. Alternatively, a slower but slightly more accurate ESpritz variant using sequence profiles can be used for applications requiring maximum performance. Two levels of prediction confidence allow either to maximize reasonable disorder detection or to limit expected false positives to 5%. ESpritz performs consistently well on the recent CASP9 data, reaching a S w measure of 54.82 and area under the receiver operator curve of 0.856. The fast predictor is four orders of magnitude faster and remains better than most publicly available CASP9 methods, making it ideal for genomic scale predictions. Conclusions: ESpritz predicts three flavors of disorder at two distinct false positive rates, either with a fast or slower and slightly more accurate approach. Given its state-of-the-art performance, it can be especially useful for high-throughput applications. Availability: Both a web server for high-throughput analysis and a Linux executable version of ESpritz are available from: http://protein.bio.unipd.it/espritz/ Contact: silvio.tosatto@unipd.it Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors—which were introduced by Røgen and co-workers—and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50 000 structures, can be clustered within seconds to minutes. Contact: thamelry@binf.ku.dk ; harder@binf.ku.dk Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: Transmembrane β barrel proteins (TMBs) are found in the outer membrane of Gram-negative bacteria, chloroplast and mitochondria. They play a major role in the translocation machinery, pore formation, membrane anchoring and ion exchange. TMBs are also promising targets for antimicrobial drugs and vaccines. Given the difficulty in membrane protein structure determination, computational methods to identify TMBs and predict the topology of TMBs are important. Results: Here, we present BOCTOPUS; an improved method for the topology prediction of TMBs by employing a combination of support vector machines (SVMs) and Hidden Markov Models (HMMs). The SVMs and HMMs account for local and global residue preferences, respectively. Based on a 10-fold cross-validation test, BOCTOPUS performs better than all existing methods, reaching a Q3 accuracy of 87%. Further, BOCTOPUS predicted the correct number of strands for 83% proteins in the dataset. BOCTOPUS might also help in reliable identification of TMBs by using it as an additional filter to methods specialized in this task. Availability: BOCTOPUS is freely available as a web server at: http://boctopus.cbr.su.se/ . The datasets used for training and evaluations are also available from this site. Contact: arne@bioinfo.se Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: High-dimensional data such as microarrays have created new challenges to traditional statistical methods. One such example is on class prediction with high-dimension, low-sample size data. Due to the small sample size, the sample mean estimates are usually unreliable. As a consequence, the performance of the class prediction methods using the sample mean may also be unsatisfactory. To obtain more accurate estimation of parameters some statistical methods, such as regularizations through shrinkage, are often desired. Results: In this article, we investigate the family of shrinkage estimators for the mean value under the quadratic loss function. The optimal shrinkage parameter is proposed under the scenario when the sample size is fixed and the dimension is large. We then construct a shrinkage-based diagonal discriminant rule by replacing the sample mean by the proposed shrinkage mean. Finally, we demonstrate via simulation studies and real data analysis that the proposed shrinkage-based rule outperforms its original competitor in a wide range of settings. Contact: tongt@hkbu.edu.hk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: The advent of high-throughput sequencing technologies is revolutionizing our ability in discovering and genotyping DNA copy number variants (CNVs). Read count-based approaches are able to detect CNV regions with an unprecedented resolution. Although this computational strategy has been recently introduced in literature, much work has been already done for the preparation, normalization and analysis of this kind of data. Results: Here we face the many aspects that cover the detection of CNVs by using read count approach. We first study the characteristics and systematic biases of read count distributions, focusing on the normalization methods designed for removing these biases. Subsequently, we compare the algorithms designed to detect the boundaries of CNVs and we investigate the ability of read count data to predict the exact number of DNA copy. Finally, we review the tools publicly available for analysing read count data. To better understand the state of the art of read count approaches, we compare the performance of the three most widely used sequencing technologies (Illumina Genome Analyzer, Roche 454 and Life Technologies SOLiD) in all the analyses that we perform. Contact: albertomagi@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: We investigate and quantify the generalizability of the white blood cell (WBC) transcriptome to the general, multiorgan transcriptome. We use data from the NCBI's Gene Expression Omnibus (GEO) public repository to define two datasets for comparison, WBC and OO (Other Organ) sets. Results: Comprehensive pair-wise correlation and expression level profiles are calculated for both datasets (with sizes of 81 and 1463, respectively). We have used mapping and ranking across the Gene Ontology (GO) categories to quantify similarity between the two sets. GO mappings of the most correlated and highly expressed genes from the two datasets tightly match, with the notable exceptions of components of the ribosome, cell adhesion and immune response. That is, 10 877 or 48.8% of all measured genes do not change 〉10% of rank range between WBC and OO; only 878 (3.9%) change rank 〉50%. Two trans -tissue gene lists are defined, the most changing and the least changing genes in expression rank. We also provide a general, quantitative measure of the probability of expression rank and correlation profile in the OO system given the expression rank and correlation profile in the WBC dataset. Contact: vvaltchinov@partners.org Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: The understanding of the molecular sources for diseases like cancer can be significantly improved by computational models. Recently, Boolean networks have become very popular for modeling signaling and regulatory networks. However, such models rely on a set of Boolean functions that are in general not known. Unfortunately, while detailed information on the molecular interactions becomes available in large scale through electronic databases, the information on the Boolean functions does not become available simultaneously and has to be included manually into the models, if at all known. Results: We propose a new Boolean approach which can directly utilize the mechanistic network information available through modern databases. The Boolean function is implicitly defined by the reaction mechanisms. Special care has been taken for the treatment of kinetic features like inhibition. The method has been applied to a signaling model combining the Wnt and MAPK pathway. Availability: A sample C++ implementation of the proposed method is available for Linux and compatible systems through http://code.google.com/p/libscopes/wiki/Paper2011 Contact: handorf@physik.hu-berlin.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 11
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. Results: We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. Availability: MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/ . Contact: simon.whelan@manchester.ac.uk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 12
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: Peptide detection is a crucial step in mass spectrometry (MS) based proteomics. Most existing algorithms are based upon greedy isotope template matching and thus may be prone to error propagation and ineffective to detect overlapping peptides. In addition, existing algorithms usually work at different charge states separately, isolating useful information that can be drawn from other charge states, which may lead to poor detection of low abundance peptides. Results: BPDA2d models spectra as a mixture of candidate peptide signals and systematically evaluates all possible combinations of possible peptide candidates to interpret the given spectra. For each candidate, BPDA2d takes into account its elution profile, charge state distribution and isotope pattern, and it combines all evidence to infer the candidate's signal and existence probability. By piecing all evidence together—especially by deriving information across charge states—low abundance peptides can be better identified and peptide detection rates can be improved. Instead of local template matching, BPDA2d performs global optimization for all candidates and systematically optimizes their signals. Since BPDA2d looks for the optimal among all possible interpretations of the given spectra, it has the capability in handling complex spectra where features overlap. BPDA2d estimates the posterior existence probability of detected peptides, which can be directly used for probability-based evaluation in subsequent processing steps. Our experiments indicate that BPDA2d outperforms state-of-the-art detection methods on both simulated data and real liquid chromatography–mass spectrometry data, according to sensitivity and detection accuracy. Availability: The BPDA2d software package is available at http://gsp.tamu.edu/Publications/supplementary/sun11a/ Contact: Michelle.Zhang@utsa.edu ; edward@ece.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 13
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: The continued progress in developing technological platforms, availability of many published experimental datasets, as well as different statistical methods to analyze those data have allowed approaching the same research question using various methods simultaneously. To get the best out of all these alternatives, we need to integrate their results in an unbiased manner. Prioritized gene lists are a common result presentation method in genomic data analysis applications. Thus, the rank aggregation methods can become a useful and general solution for the integration task. Results: Standard rank aggregation methods are often ill-suited for biological settings where the gene lists are inherently noisy. As a remedy, we propose a novel robust rank aggregation (RRA) method. Our method detects genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assigns a significance score for each gene. The underlying probabilistic model makes the algorithm parameter free and robust to outliers, noise and errors. Significance scores also provide a rigorous way to keep only the statistically relevant genes in the final list. These properties make our approach robust and compelling for many settings. Availability: All the methods are implemented as a GNU R package R obust R ank A ggreg , freely available at the Comprehensive R Archive Network http://cran.r-project.org/ . Contact: vilo@ut.ee Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 14
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-02-17
    Beschreibung: : CLARE is a computational method designed to reveal sequence encryption of tissue-specific regulatory elements. Starting with a set of regulatory elements known to be active in a particular tissue/process, it learns the sequence code of the input set and builds a predictive model from features specific to those elements. The resulting model can then be applied to user-supplied genomic regions to identify novel candidate regulatory elements. CLARE's model also provides a detailed analysis of transcription factors that most likely bind to the elements, making it an invaluable tool for understanding mechanisms of tissue-specific gene regulation. Availability: CLARE is freely accessible at http://clare.dcode.org/ . Contact: taherl@ncbi.nlm.nih.gov ; ovcharen@nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 15
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: We present a pipeline for the pre-processing, quality assessment, read distribution and methylation estimation for methylated DNA immunoprecipitation (MeDIP)-sequence datasets. This is the first MeDIP-seq-specific analytic pipeline that starts at the output of the sequencers. This pipeline will reduce the data analysis load on staff and allows the easy and straightforward analysis of sequencing data for DNA methylation. The pipeline integrates customized scripting and several existing tools, which can deal with both paired and single end data. Availability: The package and extensive documentation, and comparison to public data is available at http://life.tongji.edu.cn/meqa/ Contact: jhuang@cephb.fr
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 16
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: A plethora of bioinformatics analysis has led to the discovery of numerous gene sets, which can be interpreted as discrete measurements emitted from latent signaling pathways. Their potential to infer signaling pathway structures, however, has not been sufficiently exploited. Existing methods accommodating discrete data do not explicitly consider signal cascading mechanisms that characterize a signaling pathway. Novel computational methods are thus needed to fully utilize gene sets and broaden the scope from focusing only on pairwise interactions to the more general cascading events in the inference of signaling pathway structures. Results: We propose a gene set based simulated annealing (SA) algorithm for the reconstruction of signaling pathway structures. A signaling pathway structure is a directed graph containing up to a few hundred nodes and many overlapping signal cascades, where each cascade represents a chain of molecular interactions from the cell surface to the nucleus. Gene sets in our context refer to discrete sets of genes participating in signal cascades, the basic building blocks of a signaling pathway, with no prior information about gene orderings in the cascades. From a compendium of gene sets related to a pathway, SA aims to search for signal cascades that characterize the optimal signaling pathway structure. In the search process, the extent of overlap among signal cascades is used to measure the optimality of a structure. Throughout, we treat gene sets as random samples from a first-order Markov chain model. We evaluated the performance of SA in three case studies. In the first study conducted on 83 KEGG pathways, SA demonstrated a significantly better performance than Bayesian network methods. Since both SA and Bayesian network methods accommodate discrete data, use a ‘search and score’ network learning strategy and output a directed network, they can be compared in terms of performance and computational time. In the second study, we compared SA and Bayesian network methods using four benchmark datasets from DREAM. In our final study, we showcased two context-specific signaling pathways activated in breast cancer. Availibility: Source codes are available from http://dl.dropbox.com/u/16000775/sa_sc.zip Contact: dzhu@wayne.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 17
    Publikationsdatum: 2012-02-17
    Beschreibung: : We provide a Bioconductor package with quality assessment, processing and visualization tools for high-throughput sequencing data, with emphasis in ChIP-seq and RNA-seq studies. It includes detection of outliers and biases, inefficient immuno-precipitation and overamplification artifacts, de novo identification of read-rich genomic regions and visualization of the location and coverage of genomic region lists. Availability: www.bioconductor.org Contact: david.rossell@irbbarcelona.org Supplementary information: Supplementary data available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 18
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: We study a stochastic method for approximating the set of local minima in partial RNA folding landscapes associated with a bounded-distance neighbourhood of folding conformations. The conformations are limited to RNA secondary structures without pseudoknots. The method aims at exploring partial energy landscapes p L induced by folding simulations and their underlying neighbourhood relations. It combines an approximation of the number of local optima devised by Garnier and Kallel (2002) with a run-time estimation for identifying sets of local optima established by Reeves and Eremeev (2004). Results: The method is tested on nine sequences of length between 50 nt and 400 nt, which allows us to compare the results with data generated by RNAsubopt and subsequent barrier tree calculations. On the nine sequences, the method captures on average 92% of local minima with settings designed for a target of 95%. The run-time of the heuristic can be estimated by O ( n 2 D ln), where n is the sequence length, is the number of local minima in the partial landscape p L under consideration and D is the maximum number of steepest descent steps in attraction basins associated with p L . Contact: a.albrecht@qub.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 19
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ~ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion Contact: y.zhang@lumc.nl ; k.ye@lumc.nl Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 20
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: The completion of 168 genome sequences from a single population of Drosophila melanogaster provides a global view of genomic variation and an understanding of the evolutionary forces shaping the patterns of DNA polymorphism and divergence along the genome. Results: We present the ‘Population Drosophila Browser’ (PopDrowser), a new genome browser specially designed for the automatic analysis and representation of genetic variation across the D. melanogaster genome sequence. PopDrowser allows estimating and visualizing the values of a number of DNA polymorphism and divergence summary statistics, linkage disequilibrium parameters and several neutrality tests. PopDrowser also allows performing custom analyses on-the-fly using user-selected parameters. Availability: PopDrowser is freely available from http://PopDrowser.uab.cat . Contact: miquel.ramia@uab.cat
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 21
    Publikationsdatum: 2012-02-17
    Beschreibung: Motivation: Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the competition of multiple transcription factors (TFs) for binding to nearby sites, the tendency of TFBSs for co-regulated TFs to cluster and form cis -regulatory modules and explicit evolutionary modeling of conservation of TFBSs across orthologous sequences. However, currently available tools only incorporate some of these features, and significant methodological hurdles hampered their synthesis into a single consistent probabilistic framework. Results: We present MotEvo, a integrated suite of Bayesian probabilistic methods for the prediction of TFBSs and inference of regulatory motifs from multiple alignments of phylogenetically related DNA sequences, which incorporates all features just mentioned. In addition, MotEvo incorporates a novel model for detecting unknown functional elements that are under evolutionary constraint, and a new robust model for treating gain and loss of TFBSs along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show that MotEvo's novel features significantly improve the accuracy of TFBS prediction, motif inference and enhancer prediction. Availability: Source code, a user manual and files with several example applications are available at www.swissregulon.unibas.ch . Contact: erik.vannimwegen@unibas.ch Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 22
    Publikationsdatum: 2012-02-17
    Beschreibung: : We present LaTcOm, a new web tool, which offers several alternative methods for ‘rare codon cluster’ (RCC) identification from a single and simple graphical user interface. In the current version, three RCC detection schemes are implemented: the recently described %MinMax algorithm and a simplified sliding window approach, along with a novel modification of a linear-time algorithm for the detection of maximally scoring subsequences tailored to the RCC detection problem. Among a number of user tunable parameters, several codon-based scales relevant for RCC detection are available, including tRNA abundance values from Escherichia coli and several codon usage tables from a selection of genomes. Furthermore, useful scale transformations may be performed upon user request (e.g. linear, sigmoid). Users may choose to visualize RCC positions within the submitted sequences either with graphical representations or in textual form for further processing. Availability: LaTcOm is freely available online at the URL http://troodos.biol.ucy.ac.cy/latcom.html . Contact: vprobon@ucy.ac.cy Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 23
    Publikationsdatum: 2012-02-17
    Beschreibung: : Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe—SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from 〉2600 organisms; ‘human’ being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. Availability: http://www.rostlab.org/services/snpdbe Contact: schaefer@rostlab.org ; snpdbe@rostlab.org
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 24
    Publikationsdatum: 2012-02-17
    Beschreibung: : We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. Availability and implementation: JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet . The package includes two of our recently developed programs for detecting functional positions in protein alignments ( Xdet and S3Det ), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available. Contact: pazos@cnb.csic.es
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 25
    Publikationsdatum: 2012-02-17
    Beschreibung: : VarSifter is a graphical software tool for desktop computers that allows investigators of varying computational skills to easily and quickly sort, filter, and sift through sequence variation data. A variety of filters and a custom query framework allow filtering based on any combination of sample and annotation information. By simplifying visualization and analyses of exome-scale sequence variation data, this program will help bring the power and promise of massively-parallel DNA sequencing to a broader group of researchers. Availability and Implementation: VarSifter is written in Java, and is freely available in source and binary versions, along with a User Guide, at http://research.nhgri.nih.gov/software/VarSifter/ . Contact: mullikin@mail.nih.gov Supplementary Information: Additional figures and methods available online at the journal's website.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 26
    Publikationsdatum: 2012-12-21
    Beschreibung: Motivation: Given the current costs of next-generation sequencing, large studies carry out low-coverage sequencing followed by application of methods that leverage linkage disequilibrium to infer genotypes. We propose a novel method that assumes study samples are sequenced at low coverage and genotyped on a genome-wide microarray, as in the 1000 Genomes Project (1KGP). We assume polymorphic sites have been detected from the sequencing data and that genotype likelihoods are available at these sites. We also assume that the microarray genotypes have been phased to construct a haplotype scaffold. We then phase each polymorphic site using an MCMC algorithm that iteratively updates the unobserved alleles based on the genotype likelihoods at that site and local haplotype information. We use a multivariate normal model to capture both allele frequency and linkage disequilibrium information around each site. When sequencing data are available from trios, Mendelian transmission constraints are easily accommodated into the updates. The method is highly parallelizable, as it analyses one position at a time. Results: We illustrate the performance of the method compared with other methods using data from Phase 1 of the 1KGP in terms of genotype accuracy, phasing accuracy and downstream imputation performance. We show that the haplotype panel we infer in African samples, which was based on a trio-phased scaffold, increases downstream imputation accuracy for rare variants (R2 increases by 〉0.05 for minor allele frequency 〈1%), and this will translate into a boost in power to detect associations. These results highlight the value of incorporating microarray genotypes when calling variants from next-generation sequence data. Availability: The method (called MVNcall) is implemented in a C++ program and is available from http://www.stats.ox.ac.uk/~marchini/#software . Contact: marchini@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 27
    Publikationsdatum: 2012-12-21
    Beschreibung: Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results: To align our large (〉80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of 〉50 in mapping speed, aligning to the human genome 550 million 2 x 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80–90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/ . Contact: dobin@cshl.edu .
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 28
    Publikationsdatum: 2012-12-21
    Beschreibung: Motivation: Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. Results: This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. Availability: http://bioinf.scmb.uq.edu.au/dlocalmotif/ Contact: m.boden@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 29
    Publikationsdatum: 2012-12-21
    Beschreibung: Motivation: Protein–protein interaction (PPI) plays an important role in understanding gene functions, and many computational PPI prediction methods have been proposed in recent years. Despite the extensive efforts, PPI prediction still has much room to improve. Sequence-based co-evolution methods include the substitution rate method and the mirror tree method, which compare sequence substitution rates and topological similarity of phylogenetic trees, respectively. Although they have been used to predict PPI in species with small genomes like Escherichia coli , such methods have not been tested in large scale proteome like Homo sapiens . Result: In this study, we propose a novel sequence-based co-evolution method, co-evolutionary divergence (CD), for human PPI prediction. Built on the basic assumption that protein pairs with similar substitution rates are likely to interact with each other, the CD method converts the evolutionary information from 14 species of vertebrates into likelihood ratios and combined them together to infer PPI. We showed that the CD method outperformed the mirror tree method in three independent human PPI datasets by a large margin. With the arrival of more species genome information generated by next generation sequencing, the performance of the CD method can be further improved. Availability: Source code and support are available at http://mib.stat.sinica.edu.tw/LAP/tmp/CD.rar . Contact: syuan@stat.sinica.edu.tw Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 30
    Publikationsdatum: 2012-12-21
    Beschreibung: : In higher eukaryotes, the identification of translation initiation sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using Arabidopsis thaliana ( A.t. ) information, we developed a prediction tool for signals within genomic sequences of plants that correspond to TISs. Our tool requires only genome sequence, not expressed sequences. Its sensitivity/specificity is for A.t. (90.75%/92.2%), for Vitis vinifera (66.8%/94.4%) and for Populus trichocarpa (81.6%/94.4%), which suggests that our tool can be used in annotation of different plant genomes. We provide a list of features used in our model. Further study of these features may improve our understanding of mechanisms of the translation initiation. Availability and implementation: Our tool is implemented as an artificial neural network. It is available as a web-based tool and, together with the source code, the list of features, and data used for model development, is accessible at http://cbrc.kaust.edu.sa/dts . Contact: vladimir.bajic@kaust.edu.sa Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 31
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-12-21
    Beschreibung: Motivation: Pathway or gene set analysis has been widely applied to genomic data. Many current pathway testing methods use univariate test statistics calculated from individual genomic markers, which ignores the correlations and interactions between candidate markers. Random forests-based pathway analysis is a promising approach for incorporating complex correlation and interaction patterns, but one limitation of previous approaches is that pathways have been considered separately, thus pathway cross-talk information was not considered. Results: In this article, we develop a new pathway hunting algorithm for survival outcomes using random survival forests, which prioritize important pathways by accounting for gene correlation and genomic interactions. We show that the proposed method performs favourably compared with five popular pathway testing methods using both synthetic and real data. We find that the proposed methodology provides an efficient and powerful pathway modelling framework for high-dimensional genomic data. Availability: The R code for the analysis used in this article is available upon request. Contact: xi.steven.chen@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 32
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-12-21
    Beschreibung: Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries. Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results. Availability: PBSIM is freely available from the web under the GNU GPL v2 license ( http://code.google.com/p/pbsim/ ). Contact: mhamada@k.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 33
    Publikationsdatum: 2012-12-21
    Beschreibung: : Drugster is a fully interactive pipeline designed to break the command line barrier and introduce a new user-friendly environment to perform drug design, lead and structure optimization experiments through an efficient combination of the PDB2PQR, Ligbuilder, Gromacs and Dock suites. Our platform features a novel workflow that guides the user through each logical step of the iterative 3D structural optimization setup and drug design process, by providing a seamless interface to all incorporated packages. Availability: Drugster can be freely downloaded via our dedicated server system at http://www.bioacademy.gr/bioinformatics/drugster/ . Contact: dvlachakis@bioacademy.gr .
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 34
    Publikationsdatum: 2012-12-21
    Beschreibung: XiP (eXtensible integrative Pipeline) is a flexible, editable and modular environment with a user-friendly interface that does not require previous advanced programming skills to run, construct and edit workflows. XiP allows the construction of workflows by linking components written in both R and Java, the analysis of high-throughput data in grid engine systems and also the development of customized pipelines that can be encapsulated in a package and distributed. XiP already comes with several ready-to-use pipeline flows for the most common genomic and transcriptomic analysis and ~300 computational components. Availability: XiP is open source, freely available under the Lesser General Public License (LGPL) and can be downloaded from http://xip.hgc.jp . Contact: nagasaki@megabank.tohoku.ac.jp
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 35
    Publikationsdatum: 2012-12-21
    Beschreibung: Existing repositories for experimental datasets typically capture snapshots of data acquired using a single experimental technique and often require manual population and continual curation. We present a storage system for heterogeneous research data that performs dynamic automated indexing to provide powerful search, discovery and collaboration features without the restrictions of a structured repository. ADAM is able to index many commonly used file formats generated by laboratory assays and therefore offers specific advantages to the experimental biology community. However, it is not domain specific and can promote sharing and re-use of working data across scientific disciplines. Availability and implementation: ADAM is implemented using Java and supported on Linux. It is open source under the GNU General Public License v3.0. Installation instructions, binary code, a demo system and virtual machine image and are available at http://www.imperial.ac.uk/bioinfsupport/resources/software/adam . Contact: m.woodbridge@imperial.ac.uk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 36
    Publikationsdatum: 2012-12-21
    Beschreibung: : Drug versus Disease (DvD) provides a pipeline, available through R or Cytoscape, for the comparison of drug and disease gene expression profiles from public microarray repositories. Negatively correlated profiles can be used to generate hypotheses of drug-repurposing, whereas positively correlated profiles may be used to infer side effects of drugs. DvD allows users to compare drug and disease signatures with dynamic access to databases Array Express, Gene Expression Omnibus and data from the Connectivity Map. Availability and implementation: R package (submitted to Bioconductor) under GPL 3 and Cytoscape plug-in freely available for download at www.ebi.ac.uk/saezrodriguez/DVD/ . Contact: saezrodriguez@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 37
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: RNA-Seq uses the high-throughput sequencing technology to identify and quantify transcriptome at an unprecedented high resolution and low cost. However, RNA-Seq reads are usually not uniformly distributed and biases in RNA-Seq data post great challenges in many applications including transcriptome assembly and the expression level estimation of genes or isoforms. Much effort has been made in the literature to calibrate the expression level estimation from biased RNA-Seq data, but the effect of biases on transcriptome assembly remains largely unexplored. Results: Here, we propose a statistical framework for both transcriptome assembly and isoform expression level estimation from biased RNA-Seq data. Using a quasi-multinomial distribution model, our method is able to capture various types of RNA-Seq biases, including positional, sequencing and mappability biases. Our experimental results on simulated and real RNA-Seq datasets exhibit interesting effects of RNA-Seq biases on both transcriptome assembly and isoform expression level estimation. The advantage of our method is clearly shown in the experimental analysis by its high sensitivity and precision in transcriptome assembly and the high concordance of its estimated expression levels with quantitative reverse transcription–polymerase chain reaction data. Availability: CEM is freely available at http://www.cs.ucr.edu/~liw/cem.html . Contact: liw@cs.ucr.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 38
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: A number of studies of individual proteins have shown that post-translational modifications (PTMs) are associated with structural rearrangements of their target proteins. Although such studies provide critical insights into the mechanics behind the dynamic regulation of protein function, they usually feature examples with relatively large conformational changes. However, with the steady growth of Protein Data Bank (PDB) and available PTM sites, it is now possible to more systematically characterize the role of PTMs as conformational switches. In this study, we ask (1) what is the expected extent of structural change upon PTM, (2) how often are those changes in fact substantial, (3) whether the structural impact is spatially localized or global and (4) whether different PTMs have different signatures. Results: We exploit redundancy in PDB and, using root-mean-square deviation, study the conformational heterogeneity of groups of protein structures corresponding to identical sequences in their unmodified and modified forms. We primarily focus on the two most abundant PTMs in PDB, glycosylation and phosphorylation, but show that acetylation and methylation have similar tendencies. Our results provide evidence that PTMs induce conformational changes at both local and global level. However, the proportion of large changes is unexpectedly small; only 7% of glycosylated and 13% of phosphorylated proteins undergo global changes 〉2 Å. Further analysis suggests that phosphorylation stabilizes protein structure by reducing global conformational heterogeneity by 25%. Overall, these results suggest a subtle but common role of allostery in the mechanisms through which PTMs affect regulatory and signaling pathways. Contact: predrag@indiana.edu Supplementary Information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 39
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: Current methods in diagnostic microbiology typically focus on the detection of a single genomic locus or protein in a candidate agent. The presence of the entire microbe is then inferred from this isolated result. Problematically, the presence of recombination in microbial genomes would go undetected unless other genomic loci or protein components were specifically assayed. Microarrays lend themselves well to the detection of multiple loci from a given microbe; furthermore, the inherent nature of microarrays facilitates highly parallel interrogation of multiple microbes. However, none of the existing methods for analyzing diagnostic microarray data has the capacity to specifically identify recombinant microbes. In previous work, we developed a novel algorithm, VIPR, for analyzing diagnostic microarray data. Results: We have expanded upon our previous implementation of VIPR by incorporating a hidden Markov model (HMM) to detect recombinant genomes. We trained our HMM on a set of non-recombinant parental viruses and applied our method to 11 recombinant alphaviruses and 4 recombinant flaviviruses hybridized to a diagnostic microarray in order to evaluate performance of the HMM. VIPR HMM correctly identified 95% of the 62 inter-species recombination breakpoints in the validation set and only two false-positive breakpoints were predicted. This study represents the first description and validation of an algorithm capable of detecting recombinant viruses based on diagnostic microarray hybridization patterns. Availability: VIPR HMM is freely available for academic use and can be downloaded from http://ibridgenetwork.org/wustl/vipr . Contact: davewang@borcim.wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 40
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: The boost of next-generation sequencing technologies provides us with an unprecedented opportunity for elucidating genetic mysteries, yet the short-read length hinders us from better assembling the genome from scratch. New protocols now exist that can generate overlapping pair-end reads. By joining the 3' ends of each read pair, one is able to construct longer reads for assembling. However, effectively joining two overlapped pair-end reads remains a challenging task. Result: In this article, we present an efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k -mer frequencies. We evaluated our tool on 30 x simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads. Availability and implementation: COPE is implemented in C++ and is freely available as open-source code at ftp://ftp.genomics.org.cn/pub/cope . Contact: twlam@cs.hku.hk or luoruibang@genomics.org.cn
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 41
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: It becomes widely accepted that human cancer is a disease involving dynamic changes in the genome and that the missense mutations constitute the bulk of human genetic variations. A multitude of computational algorithms, especially the machine learning-based ones, has consequently been proposed to distinguish missense changes that contribute to the cancer progression (‘driver’ mutation) from those that do not (‘passenger’ mutation). However, the existing methods have multifaceted shortcomings, in the sense that they either adopt incomplete feature space or depend on protein structural databases which are usually far from integrated. Results: In this article, we investigated multiple aspects of a missense mutation and identified a novel feature space that well distinguishes cancer-associated driver mutations from passenger ones. An index (DX score) was proposed to evaluate the discriminating capability of each feature, and a subset of these features which ranks top was selected to build the SVM classifier. Cross-validation showed that the classifier trained on our selected features significantly outperforms the existing ones both in precision and robustness. We applied our method to several datasets of missense mutations culled from published database and literature and obtained more reasonable results than previous studies. Availability : The software is available online at http://www.methodisthealth.com/software and https://sites.google.com/site/drivermutationidentification/ . Contact : xzhou@tmhs.org Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 42
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. Results: In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. Availability: The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/ . Contact: zyhe@dlut.edu.cn Supplementary information: Supplementary data are available at Bioinformatics Online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 43
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: Methylation of cytosines in DNA is an important epigenetic mechanism involved in transcriptional regulation and preservation of genome integrity in a wide range of eukaryotes. Immunoprecipitation of methylated DNA followed by hybridization to genomic tiling arrays (MeDIP-chip) is a cost-effective and sensitive method for methylome analyses. However, existing bioinformatics methods only enable a binary classification into unmethylated and methylated genomic regions, which limit biological interpretations. Indeed, DNA methylation levels can vary substantially within a given DNA fragment depending on the number and degree of methylated cytosines. Therefore, a method for the identification of more than two methylation states is highly desirable. Results: Here, we present a three-state hidden Markov model (MeDIP-HMM) for analyzing MeDIP-chip data. MeDIP-HMM uses a higher-order state-transition process improving modeling of spatial dependencies between chromosomal regions, allows a simultaneous analysis of replicates and enables a differentiation between unmethylated, methylated and highly methylated genomic regions. We train MeDIP-HMM using a Bayesian Baum–Welch algorithm, integrating prior knowledge on methylation levels. We apply MeDIP-HMM to the analysis of the Arabidopsis root methylome and systematically investigate the benefit of using higher-order HMMs. Moreover, we also perform an in-depth comparison study with existing methods and demonstrate the value of using MeDIP-HMM by comparisons to current knowledge on the Arabidopsis methylome. We find that MeDIP-HMM is a fast and precise method for the analysis of methylome data, enabling the identification of distinct DNA methylation levels. Finally, we provide evidence for the general applicability of MeDIP-HMM by analyzing promoter DNA methylation data obtained for chicken. Availability: MeDIP-HMM is available as part of the open-source Java library Jstacs ( www.jstacs.de/index.php/MeDIP-HMM ). Data files are available from the Jstacs website. Contact: seifert@ipk-gatersleben.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 44
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: Automated annotation of neuroanatomical connectivity statements from the neuroscience literature would enable accessible and large-scale connectivity resources. Unfortunately, the connectivity findings are not formally encoded and occur as natural language text. This hinders aggregation, indexing, searching and integration of the reports. We annotated a set of 1377 abstracts for connectivity relations to facilitate automated extraction of connectivity relationships from neuroscience literature. We tested several baseline measures based on co-occurrence and lexical rules. We compare results from seven machine learning methods adapted from the protein interaction extraction domain that employ part-of-speech, dependency and syntax features. Results: Co-occurrence based methods provided high recall with weak precision. The shallow linguistic kernel recalled 70.1% of the sentence-level connectivity statements at 50.3% precision. Owing to its speed and simplicity, we applied the shallow linguistic kernel to a large set of new abstracts. To evaluate the results, we compared 2688 extracted connections with the Brain Architecture Management System (an existing database of rat connectivity). The extracted connections were connected in the Brain Architecture Management System at a rate of 63.5%, compared with 51.1% for co-occurring brain region pairs. We found that precision increases with the recency and frequency of the extracted relationships. Availability and implementation: The source code, evaluations, documentation and other supplementary materials are available at http://www.chibi.ubc.ca/WhiteText . Contact: paul@chibi.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics Online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 45
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: The first step for clinical diagnostics, prognostics and targeted therapeutics of cancer is to comprehensively understand its molecular mechanisms. Large-scale cancer genomics projects are providing a large volume of data about genomic, epigenomic and gene expression aberrations in multiple cancer types. One of the remaining challenges is to identify driver mutations, driver genes and driver pathways promoting cancer proliferation and filter out the unfunctional and passenger ones. Results: In this study, we propose two methods to solve the so-called maximum weight submatrix problem, which is designed to de novo identify mutated driver pathways from mutation data in cancer. The first one is an exact method that can be helpful for assessing other approximate or/and heuristic algorithms. The second one is a stochastic and flexible method that can be employed to incorporate other types of information to improve the first method. Particularly, we propose an integrative model to combine mutation and expression data. We first apply our methods onto simulated data to show their efficiency. We further apply the proposed methods onto several real biological datasets, such as the mutation profiles of 74 head and neck squamous cell carcinomas samples, 90 glioblastoma tumor samples and 313 ovarian carcinoma samples. The gene expression profiles were also considered for the later two data. The results show that our integrative model can identify more biologically relevant gene sets. We have implemented all these methods and made a package called mutated driver pathway finder, which can be easily used for other researchers. Availability: A MATLAB package of MDPFinder is available at http://zhangroup.aporc.org/ShiHuaZhang Contact: zsh@amss.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 46
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: Biologistics provides data for quantitative analysis of transport (diffusion) processes and their spatio-temporal correlations in cells. Mobility of proteins is one of the few parameters necessary to describe reaction rates for gene regulation. Although understanding of diffusion-limited biochemical reactions in vivo requires mobility data for the largest possible number of proteins in their native forms, currently, there is no database that would contain the complete information about the diffusion coefficients (DCs) of proteins in a given cell type. Results: We demonstrate a method for the determination of in vivo DCs for any molecule—regardless of its molecular weight, size and structure—in any type of cell. We exemplify the method with the database of in vivo DC for all proteins (4302 records) from the proteome of K12 strain of Escherichia coli , together with examples of DC of amino acids, sugars, RNA and DNA. The database follows from the scale-dependent viscosity reference curve (sdVRC). Construction of sdVRC for prokaryotic or eukaryotic cell requires ~20 in vivo measurements using techniques such as fluorescence correlation spectroscopy (FCS), fluorescence recovery after photobleaching (FRAP), nuclear magnetic resonance (NMR) or particle tracking. The shape of the sdVRC would be different for each organism, but the mathematical form of the curve remains the same. The presented method has a high predictive power, as the measurements of DCs of several inert, properly chosen probes in a single cell type allows to determine the DCs of thousands of proteins. Additionally, obtained mobility data allow quantitative study of biochemical interactions in vivo . Contact: rholyst@ichf.edu.pl Supplementary information: Supplementary data are available at Bioinformatics Online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 47
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Results: Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Availability: Mendel-GPU , our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/ . Contact: gary.k.chen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 48
    Publikationsdatum: 2012-11-11
    Beschreibung: : Phylogenetics, likelihood, evolution and complexity ( PLEX ) is a flexible and fast Bayesian Markov chain Monte Carlo software program for large-scale analysis of nucleotide and amino acid data using complex evolutionary models in a phylogenetic framework. The program gains large speed improvements over standard approaches by implementing ‘partial sampling of substitution histories’, a data augmentation approach that can reduce data analysis times from months to minutes on large comparative datasets. A variety of nucleotide and amino acid substitution models are currently implemented, including non-reversible and site-heterogeneous mixture models. Due to efficient algorithms that scale well with data size and model complexity, PLEX can be used to make inferences from hundreds to thousands of taxa in only minutes on a desktop computer. It also performs probabilistic ancestral sequence reconstruction. Future versions will support detection of co-evolutionary interactions between sites, probabilistic tests of convergent evolution and rigorous testing of evolutionary hypotheses in a Bayesian framework. Availability and implementation: PLEX v1.0 is licensed under GPL. Source code and documentation will be available for download at www.evolutionarygenomics.com/ProgramsData/PLEX . PLEX is implemented in C++ and supported on Linux, Mac OS X and other platforms supporting standard C++ compilers. Example data, control files, documentation and accessory Perl scripts are available from the website. Contact: David.Pollock@UCDenver.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 49
    Publikationsdatum: 2012-11-11
    Beschreibung: : PrIME-DLRS (or colloquially: ‘Delirious’) is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. Availability and implementation : PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/ . There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se . The C++ source code is available upon request. Contact: joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se . Supplementary Information : PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al. , 2009) and has been thoroughly validated on synthetic and biological datasets ( Supplementary Material online ).
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 50
    Publikationsdatum: 2012-11-11
    Beschreibung: : Computational Structural Biology Toolbox (CSB) is a cross-platform Python class library for reading, storing and analyzing biomolecular structures with rich support for statistical analyses. CSB is designed for reusability and extensibility and comes with a clean, well-documented API following good object-oriented engineering practice. Availability: Stable release packages are available for download from the Python Package Index (PyPI) as well as from the project’s website http://csb.codeplex.com . Contacts: ivan.kalev@gmail.com or michael.habeck@tuebingen.mpg.de
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 51
    Publikationsdatum: 2012-11-11
    Beschreibung: : GREVE has been developed to assist with the identification of recurrent genomic aberrations across cancer samples. The exact characterization of such aberrations remains a challenge despite the availability of increasing amount of data, from SNParray to next-generation sequencing. Furthermore, genomic aberrations in cancer are especially difficult to handle because they are, by nature, unique to the patients. However, their recurrence in specific regions of the genome has been shown to reflect their relevance in the development of tumors. GREVE makes use of previously characterized events to identify such regions and focus any further analysis. Availability: GREVE is available through a web interface and open-source application ( http://www.well.ox.ac.uk/GREVE ).
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 52
    Publikationsdatum: 2012-11-11
    Beschreibung: : Protein interaction networks are widely used to depict the relationships between proteins. These networks often lack the information on physical binary interactions, and they do not inform whether there is incompatibility of structure between binding partners. Here, we introduce SAPIN, a framework dedicated to the structural analysis of protein interaction networks. SAPIN first identifies the protein parts that could be involved in the interaction and provides template structures. Next, SAPIN performs structural superimpositions to identify compatible and mutually exclusive interactions. Finally, the results are displayed using Cytoscape Web. Availability: The SAPIN server is available at http://sapin.crg.es . Contact: jae-seong.yang@crg.eu or christina.kiel@crg.eu Supplementary information: Supplementary data are available at Bioinformatics Online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 53
    Publikationsdatum: 2012-11-11
    Beschreibung: : ChemBioServer is a publicly available web application for effectively mining and filtering chemical compounds used in drug discovery. It provides researchers with the ability to (i) browse and visualize compounds along with their properties, (ii) filter chemical compounds for a variety of properties such as steric clashes and toxicity, (iii) apply perfect match substructure search, (iv) cluster compounds according to their physicochemical properties providing representative compounds for each cluster, (v) build custom compound mining pipelines and (vi) quantify through property graphs the top ranking compounds in drug discovery procedures. ChemBioServer allows for pre-processing of compounds prior to an in silico screen, as well as for post-processing of top-ranked molecules resulting from a docking exercise with the aim to increase the efficiency and the quality of compound selection that will pass to the experimental test phase. Availability: The ChemBioServer web application is available at: http://bioserver-3.bioacademy.gr/Bioserver/ChemBioServer/ . Contact: gspyrou@bioacademy.gr
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 54
    Publikationsdatum: 2012-11-11
    Beschreibung: Motivation: Cell growth and division affect the kinetics of internal cellular processes and the phenotype diversity of cell populations. Since the effects are complex, e.g. different cellular components are partitioned differently in cell division, to account for them in silico, one needs to simulate these processes in great detail. Results : We present SGNS2, a simulator of chemical reaction systems according to the Stochastic Simulation Algorithm with multi-delayed reactions within hierarchical, interlinked compartments which can be created, destroyed and divided at runtime. In division, molecules are randomly segregated into the daughter cells following a specified distribution corresponding to one of several partitioning schemes, applicable on a per-molecule-type basis. We exemplify its use with six models including a stochastic model of the disposal mechanism of unwanted protein aggregates in Escherichia coli , a model of phenotypic diversity in populations with different levels of synchrony, a model of a bacteriophage’s infection of a cell population and a model of prokaryotic gene expression at the nucleotide and codon levels. Availability : SGNS2, instructions and examples available at www.cs.tut.fi/~lloydpri/sgns2/ (open source under New BSD license). Contact : jason.lloyd-price@tut.fi Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 55
    Publikationsdatum: 2012-11-11
    Beschreibung: : There is an immediate need for tools to both analyse and visualize in real-time single-nucleotide polymorphisms, insertions and deletions, and other structural variants from new sequence file formats. We have developed VarB software that can be used to visualize variant call format files in real time, as well as identify regions under balancing selection and informative markers to differentiate user-defined groups (e.g. populations). We demonstrate its utility using sequence data from 50 Plasmodium falciparum isolates comprising two different continents and confirm known signals from genomic regions that contain important antigenic and anti-malarial drug-resistance genes. Availability and implementation: The C++-based software VarB and user manual are available from www.pathogenseq.org/varb . Contact: taane.clark@lshtm.ac.uk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 56
    Publikationsdatum: 2012-11-11
    Beschreibung: : comb-p is a command-line tool and a python library that manipulates BED files of possibly irregularly spaced P -values and (1) calculates auto-correlation, (2) combines adjacent P -values, (3) performs false discovery adjustment, (4) finds regions of enrichment (i.e. series of adjacent low P -values) and (5) assigns significance to those regions. In addition, tools are provided for visualization and assessment. We provide validation and example uses on bisulfite-seq with P -values from Fisher’s exact test, tiled methylation probes using a linear model and Dam-ID for chromatin binding using moderated t -statistics. Because the library accepts input in a simple, standardized format and is unaffected by the origin of the P -values, it can be used for a wide variety of applications. Availability: comb-p is maintained under the BSD license. The documentation and implementation are available at https://github.com/brentp/combined-pvalues . Contact: bpederse@gmail.com
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 57
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-11-11
    Beschreibung: : ImgLib2 is an open-source Java library for n -dimensional data representation and manipulation with focus on image processing. It aims at minimizing code duplication by cleanly separating pixel-algebra, data access and data representation in memory. Algorithms can be implemented for classes of pixel types and generic access patterns by which they become independent of the specific dimensionality, pixel type and data representation. ImgLib2 illustrates that an elegant high-level programming interface can be achieved without sacrificing performance. It provides efficient implementations of common data types, storage layouts and algorithms. It is the data model underlying ImageJ2, the KNIME Image Processing toolbox and an increasing number of Fiji-Plugins. Availability : ImgLib2 is licensed under BSD. Documentation and source code are available at http://imglib2.net and in a public repository at https://github.com/imagej/imglib . Supplementary Information: Supplementary data are available at Bioinformatics Online. Contact : saalfeld@mpi-cbg.de
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 58
    Publikationsdatum: 2012-11-11
    Beschreibung: : We have established an RNA mapping database (RMDB) to enable structural, thermodynamic and kinetic comparisons across single-nucleotide-resolution RNA structure mapping experiments. The volume of structure mapping data has greatly increased since the development of high-throughput sequencing techniques, accelerated software pipelines and large-scale mutagenesis. For scientists wishing to infer relationships between RNA sequence/structure and these mapping data, there is a need for a database that is curated, tagged with error estimates and interfaced with tools for sharing, visualization, search and meta-analysis. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution mapping data in heat-map, bar-graph and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 53 entries, describing more than 2848 experiments of 1098 RNA constructs in several solution conditions and is growing rapidly. Availability: Freely available on the web at http://rmdb.stanford.edu Contact: rhiju@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics Online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 59
    Publikationsdatum: 2012-11-11
    Beschreibung: : Spoligotyping is a well-established genotyping technique based on the presence of unique DNA sequences in Mycobacterium tuberculosis ( Mtb ), the causal agent of tuberculosis disease (TB). Although advances in sequencing technologies are leading to whole-genome bacterial characterization, tens of thousands of isolates have been spoligotyped, giving a global view of Mtb strain diversity. To bridge the gap, we have developed SpolPred , a software to predict the spoligotype from raw sequence reads. Our approach is compared with experimentally and de novo assembly determined strain types in a set of 44 Mtb isolates. In silico and experimental results are identical for almost all isolates (39/44). However, SpolPred detected five experimentally false spoligotypes and was more accurate and faster than the assembling strategy. Application of SpolPred to an additional seven isolates with no laboratory data led to types that clustered with identical experimental types in a phylogenetic analysis using single-nucleotide polymorphisms. Our results demonstrate the usefulness of the tool and its role in revealing experimental limitations. Availability and implementation : SpolPred is written in C and is available from www.pathogenseq.org/spolpred . Contact: francesc.coll@lshtm.ac.uk Supplementary information: Supplementary data are available at Bioinformatics Online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 60
    Publikationsdatum: 2012-11-11
    Beschreibung: : NetworkView is an application for the display and analysis of protein·RNA interaction networks derived from structure and/or dynamics. These networks typically model individual protein residues and nucleic acid monomers as nodes and their pairwise contacts as edges with associated weights. NetworkView projects the network onto the underlying 3D molecular structure so that visualization and analysis of the network can be coupled to physical and biological properties. NetworkView is implemented as a plugin to the molecular visualization software VMD. Availability and implementation : NetworkView is included with VMD, which is available at http://www.ks.uiuc.edu/Research/vmd/ . Documentation, tutorials and supporting programs are available at http://www.scs.illinois.edu/schulten/software/ . Contact : networkview@scs.illinois.edu
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 61
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Hepatitis B virus and hepatitis C virus are the two leading causes resulting in hepatocellular carcinoma (HCC). It is observed that hepatitis C virus (HCV) is relatively difficult to induce HCC compared with hepatitis B virus (HBV). This motivates us to reveal the reasons behind this from the viewpoint of immune genes. Results: To distinguish the immune genes with low-level expression in HBV-induced HCC, but high-level expression in HCV-induced HCC, the concept of distinction immune gene is proposed. A filter is then designed to screen these genes. By using gene positive network with strong correlations between genes, the genes are further filtered to form the set of key distinction immune genes. The 23 key distinction immune genes are screened, which are divided into four clusters, T cells, B cells, immune signalling and major histocompatibility complex. It is evident that the screened genes are important immune genes, which are activated in HCV-induced HCC, but inactivated in HBV-induced HCC. In HCV-induced HCC, the structures of HCV adaptively update, so that they are difficult to be identified by antigens. Therefore, the clinic advice is either to increase the update speed of antigens or reduce the update speed of the viruses during the treatment of HCV-induced HCC. Moreover, it is also advised to add T cells or add the expression levels of T cells to strengthen the ability to kill cancer cells. In contrast, HBV updates slowly, but the immunity system in HBV-induced HCC has been damaged seriously. As a result, the clinic advice is to improve the immune ability of patients subjected to HBV-induced HCC, such as increasing immunoglobulin, T cells and B cells and so forth. Contact: zhiwei.gao@northumbria.ac.uk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 62
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Evolutionary expansion of gene regulatory circuits seems to boost morphological complexity. However, the expansion patterns and the quantification relationships have not yet been identified. In this study, we focus on the regulatory circuits at the post-transcriptional level, investigating whether and how this principle may apply. Results: By analysing the structure of mRNA transcripts in multiple metazoan species, we observed a striking exponential correlation between the length of 3' untranslated regions (3'UTR) and morphological complexity as measured by the number of cell types in each organism. Cellular diversity was similarly associated with the accumulation of microRNA genes and their putative targets. We propose that the lengthening of 3'UTRs together with a commensurate exponential expansion in post-transcriptional regulatory circuits can contribute to the emergence of new cell types during animal evolution. Contact: yukijuan@ntu.edu.tw or hsuancheng@ym.edu.tw . Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 63
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Comparing genomes of individual organisms using next-generation sequencing data is, until now, mostly performed using a reference genome. This is challenging when the reference is distant and introduces bias towards the exact sequence present in the reference. Recent improvements in both sequencing read length and efficiency of assembly algorithms have brought direct comparison of individual genomes by de novo assembly, rather than through a reference genome, within reach. Results: Here, we develop and test an algorithm, named Magnolya, that uses a Poisson mixture model for copy number estimation of contigs assembled from sequencing data. We combine this with co-assembly to allow de novo detection of copy number variation (CNV) between two individual genomes, without mapping reads to a reference genome. In co-assembly, multiple sequencing samples are combined, generating a single contig graph with different traversal counts for the nodes and edges between the samples. In the resulting ‘coloured’ graph, the contigs have integer copy numbers; this negates the need to segment genomic regions based on depth of coverage, as required for mapping-based detection methods. Magnolya is then used to assign integer copy numbers to contigs, after which CNV probabilities are easily inferred. The copy number estimator and CNV detector perform well on simulated data. Application of the algorithms to hybrid yeast genomes showed allotriploid content from different origin in the wine yeast Y12, and extensive CNV in aneuploid brewing yeast genomes. Integer CNV was also accurately detected in a short-term laboratory-evolved yeast strain. Availability: Magnolya is implemented in Python and available at: http://bioinformatics.tudelft.nl/ Contact: d.deridder@tudelft.nl Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 64
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA sequences. To tackle this dilemma, we require a fast and accurate aligner that takes structural information into consideration to yield reliable structural alignments, which are suitable for common secondary structure prediction. Results: We develop DAFS , a novel algorithm that simultaneously aligns and folds RNA sequences based on maximizing expected accuracy of a predicted common secondary structure and its alignment. DAFS decomposes the pairwise structural alignment problem into two independent secondary structure prediction problems and one pairwise (non-structural) alignment problem by the dual decomposition technique, and maintains the consistency of a pairwise structural alignment by imposing penalties on inconsistent base pairs and alignment columns that are iteratively updated. Furthermore, we extend DAFS to consider pseudoknots in RNA structural alignments by integrating IPknot for predicting a pseudoknotted structure. The experiments on publicly available datasets showed that DAFS can produce reliable structural alignments from unaligned sequences in terms of accuracy of common secondary structure prediction. Availability: The program of DAFS and the datasets are available at http://www.ncrna.org/software/dafs/ . Contact: satoken@bio.keio.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 65
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method that yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared with standard substitution matrix-based alignment. Results: Here, we develop an alternative discriminative approach to predict sequence context-specific substitution scores. We applied our approach to compute context-specific sequence profiles for Basic Local Alignment Search Tool (BLAST) and compared the new tool (CS-BLASTdis) to BLAST and the previous context-specific version (CS-BLASTgen). On a dataset filtered to 20% maximum sequence identity, CS-BLASTdisis was 51% more sensitive than BLAST and 17% more sensitive than CS-BLASTgenin, detecting remote homologues at 10% false discovery rate. At 30% maximum sequence identity, its alignments contain 21 and 12% more correct residue pairs than those of BLAST and CS-BLASTgen, respectively. Clear improvements are also seen when the approach is combined with PSI-BLAST and HHblits. We believe the context-specific approach should replace substitution matrices wherever sensitivity and alignment quality are critical. Availability: Source code (GNU General Public License, version 3) and benchmark data are available at ftp://toolkit.genzentrum.lmu.de/pub/csblast/ . Contact: soeding@genzentrum.lmu.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 66
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Statistical methods for comparing relative rates of synonymous and non - synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates ( ) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non - synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific estimates. Results: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific estimates. Simulations demonstrate that this method competes well with more - principled statistical procedures and , in some cases , even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples. Availability: Renaissance counting is implemented in the development branch of BEAST, freely available at http://code.google.com/p/beast-mcmc/ . The method will be made available in the next public release of the package, including support to set up analyses in BEAUti. Contact: philippe.lemey@rega.kuleuven.be or msuchard@ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 67
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known microorganisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic analysis by creating a cross-assembly of all reads, i.e. a single assembly of reads from different samples. Comparative metagenomics studies the interrelationships between metagenomes from different samples. Using an assembly algorithm is a fast and intuitive way to link (partially) homologous reads without requiring a database of reference sequences. Results: Here, we introduce crAss, a novel bioinformatic tool that enables fast simple analysis of cross-assembly files, yielding distances between all metagenomic sample pairs and an insightful image displaying the similarities. Availability and implementation: crAss is available as a web server at http://edwards.sdsu.edu/crass/ , and the Perl source code can be downloaded to run as a stand-alone command line tool. Contact: dutilh@cmbi.ru.nl Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 68
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: The discovery of novel gene fusions can lead to a better comprehension of cancer progression and development. The emergence of deep sequencing of trancriptome, known as RNA-seq, has opened many opportunities for the identification of this class of genomic alterations, leading to the discovery of novel chimeric transcripts in melanomas, breast cancers and lymphomas. Nowadays, few computational approaches have been developed for the detection of chimeric transcripts. Although all of these computational methods show good sensitivity, much work remains to reduce the huge number of false-positive calls that arises from this analysis. Results: We proposed a novel computational framework, named chimEric tranScript detection algorithm (EricScript), for the identification of gene fusion products in paired-end RNA-seq data. Our simulation study on synthetic data demonstrates that EricScript enables to achieve higher sensitivity and specificity than existing methods with noticeably lower running times. We also applied our method to publicly available RNA-seq tumour datasets, and we showed its capability in rediscovering known gene fusions. Availability: The EricScript package is freely available under GPL v3 license at http://ericscript.sourceforge.net . Contact: matteo.benelli@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 69
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/ . Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 70
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Gene selection for cancer classification is one of the most important topics in the biomedical field. However, microarray data pose a severe challenge for computational techniques. We need dimension reduction techniques that identify a small set of genes to achieve better learning performance. From the perspective of machine learning, the selection of genes can be considered to be a feature selection problem that aims to find a small subset of features that has the most discriminative information for the target. Results: In this article, we proposed an Ensemble Correlation-Based Gene Selection algorithm based on symmetrical uncertainty and Support Vector Machine. In our method, symmetrical uncertainty was used to analyze the relevance of the genes, the different starting points of the relevant subset were used to generate the gene subsets and the Support Vector Machine was used as an evaluation criterion of the wrapper. The efficiency and effectiveness of our method were demonstrated through comparisons with other feature selection techniques, and the results show that our method outperformed other methods published in the literature. Availability: By request from the author. Contact: pyz@dblab.chungbuk.ac.kr ; khryu@dblab.cbnu.ac.kr
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 71
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: The existence of families with many individuals affected by the same complex disease has long suggested the possibility of rare alleles of high penetrance. In contrast to Mendelian diseases, however, linkage studies have identified very few reproducibly linked loci in diseases such as diabetes and autism. Genome-wide association studies have had greater success with such diseases, but these results explain neither the extreme disease load nor the within-family linkage peaks, of some large pedigrees. Combining linkage information with exome or genome sequencing from large complex disease pedigrees might finally identify family-specific, high-penetrance mutations. Results: Olorin is a tool , which integrates gene flow within families with next generation sequencing data to enable the analysis of complex disease pedigrees. Users can interactively filter and prioritize variants based on haplotype sharing across selected individuals and other measures of importance, including predicted functional consequence and population frequency. Availability: http://www.sanger.ac.uk/resources/software/olorin Contact: olorin@sanger.ac.uk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 72
    Publikationsdatum: 2012-12-08
    Beschreibung: Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution. Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental techniques for rapid and accurate structure determination of protein–protein complexes. The integrative method uses atomistic models of two interacting proteins and one or more datasets from five accessible experimental techniques: a small-angle X-ray scattering (SAXS) profile, 2D class average images from negative-stain electron microscopy micrographs (EM), a 3D density map from single-particle negative-stain EM, residue type content of the protein–protein interface from NMR spectroscopy and chemical cross-linking detected by mass spectrometry. The method is tested on a docking benchmark consisting of 176 known complex structures and simulated experimental data. The near-native model is the top scoring one for up to 61% of benchmark cases depending on the included experimental datasets; in comparison to 10% for standard computational docking. We also collected SAXS, 2D class average images and 3D density map from negative-stain EM to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure. Availability: http://salilab.org/idock Contact: dina@salilab.org or sali@salilab.org Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 73
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-12-08
    Beschreibung: : MicroRNA (miRNA) target prediction is an important problem. Given an miRNA sequence the task is to determine the identity of the messenger RNAs targeted by it, the locations within them where the interactions happen and the specifics of the formed heteroduplexes. Here, we describe a web-based application, RNA22-GUI, which we have designed and implemented for the interactive exploration and in-context visualization of predictions by RNA22, one of the popular miRNA target prediction algorithms. Central to our design has been the requirement to provide informative and comprehensive visualization that is integrated with interactive search capabilities and permits one to selectively isolate and focus on relevant information that is distilled on-the-fly from a large repository of pre-compiled predictions. RNA22-GUI is currently available for Homo sapiens , Mus musculus , Drosophila melanogaster and Caenorhabditis elegans . Availability: http://cm.jefferson.edu/rna22v1.0/ . Contact: Isidore.Rigoutsos@jefferson.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 74
    Publikationsdatum: 2012-09-30
    Beschreibung: : We developed MolBioLib to address the need for adaptable next-generation sequencing analysis tools. The result is a compact, portable and extensively tested C++11 software framework and set of applications tailored to the demands of next-generation sequencing data and applicable to many other applications. MolBioLib is designed to work with common file formats and data types used both in genomic analysis and general data analysis. A central relational-database-like Table class is a flexible and powerful object to intuitively represent and work with a wide variety of tabular datasets, ranging from alignment data to annotations. MolBioLib has been used to identify causative single-nucleotide polymorphisms in whole genome sequencing, detect balanced chromosomal rearrangements and compute enrichment of messenger RNAs (mRNAs) on microtubules, typically requiring applications of under 200 lines of code. MolBioLib includes programs to perform a wide variety of analysis tasks, such as computing read coverage, annotating genomic intervals and novel peak calling with a wavelet algorithm. Although MolBioLib was designed primarily for bioinformatics purposes, much of its functionality is applicable to a wide range of problems. Complete documentation and an extensive automated test suite are provided. Availability: MolBioLib is available for download at: http://sourceforge.net/projects/molbiolib Contact : ohsumit@molbio.mgh.harvard.edu
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 75
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. Results: We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. Availability: YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA . Contact: imh4y@virginia.edu
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 76
    Publikationsdatum: 2012-10-11
    Beschreibung: : Identification of metabolites using high-resolution multi-stage mass spectrometry (MS n ) data is a significant challenge demanding access to all sorts of computational infrastructures. MetiTree is a user-friendly, web application dedicated to organize, process, share, visualize and compare MS n data. It integrates several features to export and visualize complex MS n data, facilitating the exploration and interpretation of metabolomics experiments. A dedicated spectral tree viewer allows the simultaneous presentation of three related types of MS n data, namely, the spectral data, the fragmentation tree and the fragmentation reactions. MetiTree stores the data in an internal database to enable searching for similar fragmentation trees and matching against other MS n data. As such MetiTree contains much functionality that will make the difficult task of identifying unknown metabolites much easier. Availability: MetiTree is accessible at http://www.MetiTree.nl . The source code is available at https://github.com/NetherlandsMetabolomicsCentre/metitree/wiki . Contact: m.rojas@lacdr.leidenuniv.nl or t.reijmers@lacdr.leidenuniv.nl
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 77
    Publikationsdatum: 2012-10-11
    Beschreibung: Motivation: Currently, there is great interest in detecting complex trait rare variant associations using next-generation sequence data. On a monthly basis, new rare variant association methods are published. It is difficult to evaluate these methods because there is no standard to generate data and often comparisons are biased. In order to fairly compare rare variant association methods, it is necessary to generate data using realistic population demographic and phenotypic models. Result: SimRare is an interactive program that integrates generation of rare variant genotype/phenotype data and evaluation of association methods using a unified platform. Variant data are generated for gene regions using forward-time simulation that incorporates realistic population demographic and evolutionary scenarios. Phenotype data can be obtained for both case–control and quantitative traits. SimRare has a user-friendly interface that allows for easy entry of genetic and phenotypic parameters. Novel rare variant association methods implemented in R can also be imported into SimRare, to evaluate their performance and compare results, e.g. power and Type I error, with other currently available methods both numerically and graphically. Availability: http://code.google.com/p/simrare/ Contact: sleal@bcm.edu
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 78
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: We previously reported the development of a highly accurate statistical algorithm for identifying β-barrel outer membrane proteins or transmembrane β-barrels (TMBBs), from genomic sequence data of Gram-negative bacteria (Freeman,T.C. and Wimley,W.C. (2010) Bioinformatics , 26 , 1965–1974). We have now applied this identification algorithm to all available Gram-negative bacterial genomes (over 600 chromosomes) and have constructed a publicly available, searchable, up-to-date, database of all proteins in these genomes. Results: For each protein in the database, there is information on (i) β-barrel membrane protein probability for identification of β-barrels, (ii) β-strand and β-hairpin propensity for structure and topology prediction, (iii) signal sequence score because most TMBBs are secreted through the inner membrane translocon and, thus, have a signal sequence, and (iv) transmembrane α-helix predictions, for reducing false positive predictions. This information is sufficient for the accurate identification of most β-barrel membrane proteins in these genomes. In the database there are nearly 50 000 predicted TMBBs (out of 1.9 million total putative proteins). Of those, more than 15 000 are ‘hypothetical’ or ‘putative’ proteins, not previously identified as TMBBs. This wealth of genomic information is not available anywhere else. Availability: The TMBB genomic database is available at http://beta-barrel.tulane.edu/ . Contact: wwimley@tulane.edu
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 79
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set—phenotype association that predicted tumor metastases within tumor subtypes. Availability: Implemented in the Bioconductor package iBBiG Contact: aedin@jimmy.harvard.edu
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 80
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: The prediction of a protein’s contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation. Results: The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated metrics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact predictions. Availability: http://icos.cs.nott.ac.uk/servers/psp.html . Contact: natalio.krasnogor@nottingham.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 81
    Publikationsdatum: 2012-09-30
    Beschreibung: : Gibberellic acids (GAs) are key plant hormones, regulating various aspects of growth and development, which have been at the center of the ‘green revolution’. GRAS family proteins, the primary players in GA signaling pathways, remain poorly understood. Using sequence-profile searches, structural comparisons and phylogenetic analysis, we establish that the GRAS family first emerged in bacteria and belongs to the Rossmann fold methyltransferase superfamily. All bacterial and a subset of plant GRAS proteins are likely to function as small-molecule methylases. The remaining plant versions have lost one or more AdoMet (SAM)-binding residues while preserving their substrate-binding residues. We predict that GRAS proteins might either modify or bind small molecules such as GAs or their derivatives. Contact: aravind@ncbi.nlm.nih.gov Supplementary Information: Supplementary Material for this article is available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 82
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: ChIP-seq consists of chromatin immunoprecipitation and deep sequencing of the extracted DNA fragments. It is the technique of choice for accurate characterization of the binding sites of transcription factors and other DNA-associated proteins. We present a web service, Nebula, which allows inexperienced users to perform a complete bioinformatics analysis of ChIP-seq data. Results: Nebula was designed for both bioinformaticians and biologists. It is based on the Galaxy open source framework. Galaxy already includes a large number of functionalities for mapping reads and peak calling. We added the following to Galaxy: (i) peak calling with FindPeaks and a module for immunoprecipitation quality control, (ii) de novo motif discovery with ChIPMunk, (iii) calculation of the density and the cumulative distribution of peak locations relative to gene transcription start sites, (iv) annotation of peaks with genomic features and (v) annotation of genes with peak information. Nebula generates the graphs and the enrichment statistics at each step of the process. During Steps 3–5, Nebula optionally repeats the analysis on a control dataset and compares these results with those from the main dataset. Nebula can also incorporate gene expression (or gene modulation) data during these steps. In summary, Nebula is an innovative web service that provides an advanced ChIP-seq analysis pipeline providing ready-to-publish results. Availability: Nebula is available at http://nebula.curie.fr/ Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 83
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse. Results: In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local ‘gene expression factory’. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes. Availability and implementation: The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/ . Contact: xjzhou@usc.edu Supplementary information: Supplementary material are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 84
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: In many situations, genome-wide association studies are performed in populations presenting stratification. Mixed models including a kinship matrix accounting for genetic relatedness among individuals have been shown to correct for population and/or family structure. Here we extend this methodology to generalized linear mixed models which properly model data under various distributions. In addition we perform association with ancestral haplotypes inferred using a hidden Markov model. Results: The method was shown to properly account for stratification under various simulated scenari presenting population and/or family structure. Use of ancestral haplotypes resulted in higher power than SNPs on simulated datasets. Application to real data demonstrates the usefulness of the developed model. Full analysis of a dataset with 4600 individuals and 500 000 SNPs was performed in 2 h 36 min and required 2.28 Gb of RAM. Availability: The software GLASCOW can be freely downloaded from www.giga.ulg.ac.be/jcms/prod_381171/software . Contact: francois.guillaume@jouy.inra.fr Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 85
    Publikationsdatum: 2012-09-30
    Beschreibung: : zCall is a variant caller specifically designed for calling rare single-nucleotide polymorphisms from array-based technology. This caller is implemented as a post-processing step after a default calling algorithm has been applied. The algorithm uses the intensity profile of the common allele homozygote cluster to define the location of the other two genotype clusters. We demonstrate improved detection of rare alleles when applying zCall to samples that have both Illumina Infinium HumanExome BeadChip and exome sequencing data available. Availability: http://atguweb.mgh.harvard.edu/apps/zcall . Contact: bneale@broadinstitute.org Supplementary Information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 86
    Publikationsdatum: 2012-09-30
    Beschreibung: : The open access comprehensive GlycoCD database application is for representation and retrieval of carbohydrate-related clusters of differentiation (CDs). The main objective of this database platform is to provide information about interactions of carbohydrate moieties with proteins that are important for identification of specific cell surface molecule with a focus on the integration of data from carbohydrate microarray databases. GlycoCD database comprises two sections: the carbohydrate recognition CD and glycan CD. It allows easy access through a user-friendly web interface to all carbohydrate-defined CDs and those that interact with carbohydrates along with other relevant information. Availability: The database is freely available at http://glycosciences.de/glycocd/index.php Contact: r.s-albiez@dkfz.de
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 87
    Publikationsdatum: 2012-09-30
    Beschreibung: The International Society for Computational Biology, ISCB, organizes the largest event in the field of computational biology and bioinformatics, namely the annual international conference on Intelligent Systems for Molecular Biology, the ISMB. This year at ISMB 2012 in Long Beach, ISCB celebrated the 20th anniversary of its flagship meeting. ISCB is a young, lean and efficient society that aspires to make a significant impact with only limited resources. Many constraints make the choice of venues for ISMB a tough challenge. Here, we describe those challenges and invite the contribution of ideas for solutions. Contact: assistant@rostlab.org
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 88
    Publikationsdatum: 2012-09-30
    Beschreibung: Motivation: RNA sequencing is becoming a standard for expression profiling experiments and many tools have been developed in the past few years to analyze RNA-Seq data. Numerous ‘Bioconductor’ packages are available for next-generation sequencing data loading in R, e.g. ShortRead and Rsamtools as well as to perform differential gene expression analyses, e.g. DESeq and edgeR. However, the processing tasks lying in between these require the precise interplay of many Bioconductor packages, e.g. Biostrings, IRanges or external solutions are to be sought. Results: We developed ‘easyRNASeq’, an R package that simplifies the processing of RNA sequencing data, hiding the complex interplay of the required packages behind a single functionality. Availability: The package is implemented in R (as of version 2.15) and is available from Bioconductor (as of version 2.10) at the URL: http://bioconductor.org/packages/release/bioc/html/easyRNASeq.html , where installation and usage instructions can be found. Contact: delhomme@embl.de
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 89
    Publikationsdatum: 2012-10-20
    Beschreibung: Motivation: It has been recently suggested that atomic burials, as expressed by molecular central distances, contain sufficient information to determine the tertiary structure of small globular proteins. A possible approach to structural determination from sequence could therefore involve a sequence-to-burial intermediate prediction step whose accuracy, however, is theoretically limited by the mutual information between these two variables. We use a non-redundant set of globular protein structures to estimate the mutual information between local amino acid sequence and atomic burials. Discretizing central distances of or atoms in equiprobable burial levels, we estimate relevant mutual information measures that are compared with actual predictions obtained from a Naive Bayesian Classifier (NBC) and a Hidden Markov Model (HMM). Results: Mutual information density for 20 amino acids and two or three burial levels were estimated to be roughly 15% of the unconditional burial entropy density. Lower estimates for the mutual information between local amino acid sequence and burial of a single residue indicated an increase in mutual information with the number of burial levels up to at least five or six levels. Prediction schemes were found to efficiently extract the available burial information from local sequence. Lower estimates for the mutual information involving single burials are consistently approached by predictions from the NBC and actually surpassed by predictions from the HMM. Near-optimal prediction for the HMM is indicated by the agreement between its density of prediction information and the corresponding density of mutual information between input and output representations. Availability: The dataset of protein structures and the prediction implementations are available at http://www.btc.unb.br/ (in ‘Software’). Contact: aaraujo@unb.br Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 90
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-10-20
    Beschreibung: Motivation: The Ensembl Project provides release-specific Perl APIs for efficient high-level programmatic access to data stored in various Ensembl database schema. Although Perl scripts are perfectly suited for processing large volumes of text-based data, Perl is not ideal for developing large-scale software applications nor embedding in graphical interfaces. The provision of a novel Java API would facilitate type-safe, modular, object-orientated development of new Bioinformatics tools with which to access, analyse and visualize Ensembl data. Results: The JEnsembl API implementation provides basic data retrieval and manipulation functionality from the Core, Compara and Variation databases for all species in Ensembl and EnsemblGenomes and is a platform for the development of a richer API to Ensembl datasources. The JEnsembl architecture uses a text-based configuration module to provide evolving, versioned mappings from database schema to code objects. A single installation of the JEnsembl API can therefore simultaneously and transparently connect to current and previous database instances (such as those in the public archive) thus facilitating better analysis repeatability and allowing ‘through time’ comparative analyses to be performed. Availability: Project development, released code libraries, Maven repository and documentation are hosted at SourceForge ( http://jensembl.sourceforge.net ). Contact: jensembl-develop@lists.sf.net , andy.law@roslin.ed.ac.uk , trevor.paterson@roslin.ed.ac.uk
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 91
    Publikationsdatum: 2012-10-20
    Beschreibung: Motivation: Regulatory, non-coding RNAs often function by forming a duplex with other RNAs. It is therefore of interest to predict putative RNA–RNA duplexes in silico on a genome-wide scale. Current computational methods for predicting these interactions range from fast complementary-based searches to those that take intramolecular binding into account. Together these methods constitute a trade-off between speed and accuracy, while leaving room for improvement within the context of genome-wide screens. A fast pre-filtering of putative duplexes would therefore be desirable. Results: We present RIsearch, an implementation of a simplified Turner energy model for fast computation of hybridization, which significantly reduces runtime while maintaining accuracy. Its time complexity for sequences of lengths m and n is with a much smaller pre-factor than other tools. We show that this energy model is an accurate approximation of the full energy model for near-complementary RNA–RNA duplexes. RIsearch uses a Smith–Waterman-like algorithm using a dinucleotide scoring matrix which approximates the Turner nearest-neighbor energies. We show in benchmarks that we achieve a speed improvement of at least 2.4 x compared with RNAplex, the currently fastest method for searching near-complementary regions. RIsearch shows a prediction accuracy similar to RNAplex on two datasets of known bacterial short RNA (sRNA)–messenger RNA (mRNA) and eukaryotic microRNA (miRNA)–mRNA interactions. Using RIsearch as a pre-filter in genome-wide screens reduces the number of binding site candidates reported by miRNA target prediction programs, such as TargetScanS and miRanda, by up to 70%. Likewise, substantial filtering was performed on bacterial RNA–RNA interaction data. Availability: The source code for RIsearch is available at: http://rth.dk/resources/risearch . Contact: gorodkin@rth.dk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 92
    Publikationsdatum: 2012-10-20
    Beschreibung: Motivation: Determining the best sampling rates (which maximize information yield and minimize cost) for time-series high-throughput gene expression experiments is a challenging optimization problem. Although existing approaches provide insight into the design of optimal sampling rates, our ability to utilize existing differential gene expression data to discover optimal timepoints is compelling. Results: We present a new data-integrative model, Optimal Timepoint Selection (OTS), to address the sampling rate problem. Three experiments were run on two different datasets in order to test the performance of OTS, including iterative-online and a top-up sampling approaches. In all of the experiments, OTS outperformed the best existing timepoint selection approaches, suggesting that it can optimize the distribution of a limited number of timepoints, potentially leading to better biological insights about the resulting gene expression patterns. Availability: OTS is available at www.msu.edu/~jinchen/OTS . Contact: wqin@lakeheadu.ca ; jinchen@msu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 93
    Publikationsdatum: 2012-10-20
    Beschreibung: Motivation: Modelling the regulation of gene expression can provide insight into the regulatory roles of individual transcription factors (TFs) and histone modifications. Recently, Ouyang et al. in 2009 modelled gene expression levels in mouse embryonic stem (mES) cells using in vivo ChIP-seq measurements of TF binding. ChIP-seq TF binding data, however, are tissue-specific and relatively difficult to obtain. This limits the applicability of gene expression models that rely on ChIP-seq TF binding data. Results: In this study, we build regression-based models that relate gene expression to the binding of 12 different TFs, 7 histone modifications and chromatin accessibility (DNase I hypersensitivity) in two different tissues. We find that expression models based on computationally predicted TF binding can achieve similar accuracy to those using in vivo TF binding data and that including binding at weak sites is critical for accurate prediction of gene expression. We also find that incorporating histone modification and chromatin accessibility data results in additional accuracy. Surprisingly, we find that models that use no TF binding data at all, but only histone modification and chromatin accessibility data, can be as (or more) accurate than those based on in vivo TF binding data. Availability and implementation: All scripts, motifs and data presented in this article are available online at http://research.imb.uq.edu.au/t.bailey/supplementary_data/McLeay2011a . Contact: t.bailey@imb.uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 94
    Publikationsdatum: 2012-04-08
    Beschreibung: Motivation: High-throughput sequencing has made the analysis of new model organisms more affordable. Although assembling a new genome can still be costly and difficult, it is possible to use RNA-seq to sequence mRNA. In the absence of a known genome, it is necessary to assemble these sequences de novo , taking into account possible alternative isoforms and the dynamic range of expression values. Results: We present a software package named Oases designed to heuristically assemble RNA-seq reads in the absence of a reference genome, across a broad spectrum of expression values and in presence of alternative isoforms. It achieves this by using an array of hash lengths, a dynamic filtering of noise, a robust resolution of alternative splicing events and the efficient merging of multiple assemblies. It was tested on human and mouse RNA-seq data and is shown to improve significantly on the transABySS and Trinity de novo transcriptome assemblers. Availability and implementation: Oases is freely available under the GPL license at www.ebi.ac.uk/~zerbino/oases/ Contact: dzerbino@ucsc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 95
    Publikationsdatum: 2012-04-08
    Beschreibung: Motivation: To further our understanding of the mechanisms underlying biochemical pathways mathematical modelling is used. Since many parameter values are unknown they need to be estimated using experimental observations. The complexity of models necessary to describe biological pathways in combination with the limited amount of quantitative data results in large parameter uncertainty which propagates into model predictions. Therefore prediction uncertainty analysis is an important topic that needs to be addressed in Systems Biology modelling. Results: We propose a strategy for model prediction uncertainty analysis by integrating profile likelihood analysis with Bayesian estimation. Our method is illustrated with an application to a model of the JAK-STAT signalling pathway. The analysis identified predictions on unobserved variables that could be made with a high level of confidence, despite that some parameters were non-identifiable. Availability and implementation: Source code is available at: http://bmi.bmt.tue.nl/sysbio/software/pua.html . Contact: j.vanlier@tue.nl Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 96
    Publikationsdatum: 2012-04-08
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 97
    facet.materialart.
    Unbekannt
    Oxford University Press
    Publikationsdatum: 2012-04-08
    Beschreibung: Motivation: microRNAs are short non-coding RNAs that regulate gene expression by inhibiting target mRNA genes. Next-generation sequencing combined with bioinformatics analyses provide an opportunity to predict numerous novel miRNAs. The efficiency of these predictions relies on the set of positive and negative controls used. We demonstrate that commonly used positive and negative controls may be unreliable and provide a rational methodology with which to replace them. Contact: w.ritchie@centenary.org.au Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 98
    Publikationsdatum: 2012-04-08
    Beschreibung: Motivation: Chromatin structure, including post-translational modifications of histones, regulates gene expression, alternative splicing and cell identity. ChIP-seq is an increasingly used assay to study chromatin function. However, tools for downstream bioinformatics analysis are limited and are only based on the evaluation of signal intensities. We reasoned that new methods taking into account other signal characteristics such as peak shape, location and frequencies might reveal new insights into chromatin function, particularly in situation where differences in read intensities are subtle. Results: We introduced an analysis pipeline, based on linear predictive coding (LPC), which allows the capture and comparison of ChIP-seq histone profiles. First, we show that the modeled signal profiles distinguish differentially expressed genes with comparable accuracy to signal intensities. The method was robust against parameter variations and performed well up to a signal-to-noise ratio of 0.55. Additionally, we show that LPC profiles of activating and repressive histone marks cluster into distinct groups and can be used to predict their function. Availability and implementation: http://www.cancerresearch.unsw.edu.au/crcweb.nsf/page/LPCHP A Matlab implementation along with usage instructions and an example input file are available from: http://www.cancerresearch.unsw.edu.au/crcweb.nsf/page/LPCHP Contact: d.beck@student.unsw.edu.au ; jpimanda@unsw.edu.au ; jason.wong@unsw.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 99
    Publikationsdatum: 2012-04-08
    Beschreibung: Motivation: Explosive growth of short-read sequencing technologies in the recent years resulted in rapid development of many new alignment algorithms and programs. But most of them are not efficient or not applicable for reads 200 bp because these algorithms specifically designed to process short queries with relatively low sequencing error rates. However, the current trend to increase reliability of detection of structural variations in assembled genomes as well as to facilitate de novo sequencing demand complimenting high-throughput short-read platforms with long-read mapping. Thus, algorithms and programs for efficient mapping of longer reads are becoming crucial. However, the choice of long-read aligners effective in terms of both performance and memory are limited and includes only handful of hash table (BLAT, SSAHA2) or trie (Burrows-Wheeler Transform - Smith-Waterman (BWT-SW), Burrows-Wheeler Alignerr - Smith-Waterman (BWA-SW)) based algorithms. Results: New O ( n ) algorithm that combines the advantages of both hash and trie-based methods has been designed to effectively align long biological sequences (200 bp) against a large sequence database with small memory footprint (e.g. ~2 GB for the human genome). The algorithm is accurate and significantly more fast than BLAT or BWT-SW, but similar to BWT-SW it can find all local alignments. It is as accurate as SSAHA2 or BWA-SW, but uses 3+ times less memory and 10+ times faster than SSAHA2, several times faster than BWA-SW with low error rates and almost two times less memory. Availability and implementation: The prototype implementation of the algorithm will be available upon request for non-commercial use in academia (local hit table binary and indices are at ftp://styx.ucsd.edu ). Contact: vit@ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 100
    Publikationsdatum: 2012-04-08
    Beschreibung: Motivation: Proteins can be naturally classified into families of homologous sequences that derive from a common ancestor. The comparison of homologous sequences and the analysis of their phylogenetic relationships provide useful information regarding the function and evolution of genes. One important difficulty of clustering methods is to distinguish highly divergent homologous sequences from sequences that only share partial homology due to evolution by protein domain rearrangements. Existing clustering methods require parameters that have to be set a priori. Given the variability in the evolution pattern among proteins, these parameters cannot be optimal for all gene families. Results: We propose a strategy that aims at clustering sequences homologous over their entire length, and that takes into account the pattern of substitution specific to each gene family. Sequences are first all compared with each other and clustered into pre-families, based on pairwise similarity criteria, with permissive parameters to optimize sensitivity. Pre-families are then divided into homogeneous clusters, based on the topology of the similarity network. Finally, clusters are progressively merged into families, for which we compute multiple alignments, and we use a model selection technique to find the optimal tradeoff between the number of families and multiple alignment likelihood. To evaluate this method, called HiFiX , we analyzed simulated sequences and manually curated datasets. These tests showed that HiFiX is the only method robust to both sequence divergence and domain rearrangements. HiFiX is fast enough to be used on very large datasets. Availability and implementation: The Python software HiFiX is freely available at http://lbbe.univ-lyon1.fr/hifix Contact: vincent.miele@univ-lyon1.fr Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...