ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Artemis: an integrated platform for visualization and analysis of high-throughput sequence-based experimental data (2012)

Carver, T., Harris, S. R., Berriman, M., Parkhill, J., McQuillan, J. A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: High-throughput sequencing (HTS) technologies have made low-cost sequencing of large numbers of samples commonplace. An explosion in the type, not just number, of sequencing experiments has also taken place including genome re-sequencing, population-scale variation detection, whole transcriptome sequencing and genome-wide analysis of protein-bound nucleic acids. Results: We present Artemis as a tool for integrated visualization and computational analysis of different types of HTS datasets in the context of a reference genome and its corresponding annotation. Availability: Artemis is freely available (under a GPL licence) for download (for MacOSX, UNIX and Windows) at the Wellcome Trust Sanger Institute websites: http://www.sanger.ac.uk/resources/software/artemis/ . Contact: artemis@sanger.ac.uk ; tjc@sanger.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

The role of miRNAs in complex formation and control (2012)

Goh, W. W. B., Oikawa, H., Sng, J. C. G., Sergot, M., Wong, L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: : microRibonucleic acid (miRNAs) are small regulatory molecules that act by mRNA degradation or via translational repression. Although many miRNAs are ubiquitously expressed, a small subset have differential expression patterns that may give rise to tissue-specific complexes. Motivation: This work studies gene targeting patterns amongst miRNAs with differential expression profiles, and links this to control and regulation of protein complexes. Results: We find that, when a pair of miRNAs are not expressed in the same tissues, there is a higher tendency for them to target the direct partners of the same hub proteins. At the same time, they also avoid targeting the same set of hub-spokes. Moreover, the complexes corresponding to these hub-spokes tend to be specific and nonoverlapping. This suggests that the effect of miRNAs on the formation of complexes is specific. Contact: wongls@comp.nus.edu.sg Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Identifying small interfering RNA loci from high-throughput sequencing data (2012)

Hardcastle, T. J., Kelly, K. A., Baulcombe, D. C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: Small interfering RNAs (siRNAs) are produced from much longer sequences of double-stranded RNA precursors through cleavage by Dicer or a Dicer-like protein. These small RNAs play a key role in genetic and epigenetic regulation; however, a full understanding of the mechanisms by which they operate depends on the characterization of the precursors from which they are derived. Results: High-throughput sequencing of small RNA populations allows the locations of the double-stranded RNA precursors to be inferred. We have developed methods to analyse small RNA sequencing data from multiple biological sources, taking into account replicate information, to identify robust sets of siRNA precursors. Our methods show good performance on both a set of small RNA sequencing data in Arabidopsis thaliana and simulated datasets. Availability: Our methods are available as the Bioconductor ( www.bioconductor.org ) package segmentSeq (version 1.5.6 and above). Contact: tjh48@cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

ESpritz: accurate and fast prediction of protein disorder (2012)

Walsh, I., Martin, A. J. M., Di Domenico, T., Tosatto, S. C. E.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: Intrinsically disordered regions are key for the function of numerous proteins, and the scant available experimental annotations suggest the existence of different disorder flavors. While efficient predictions are required to annotate entire genomes, most existing methods require sequence profiles for disorder prediction, making them cumbersome for high-throughput applications. Results: In this work, we present an ensemble of protein disorder predictors called ESpritz. These are based on bidirectional recursive neural networks and trained on three different flavors of disorder, including a novel NMR flexibility predictor. ESpritz can produce fast and accurate sequence-only predictions, annotating entire genomes in the order of hours on a single processor core. Alternatively, a slower but slightly more accurate ESpritz variant using sequence profiles can be used for applications requiring maximum performance. Two levels of prediction confidence allow either to maximize reasonable disorder detection or to limit expected false positives to 5%. ESpritz performs consistently well on the recent CASP9 data, reaching a S w measure of 54.82 and area under the receiver operator curve of 0.856. The fast predictor is four orders of magnitude faster and remains better than most publicly available CASP9 methods, making it ideal for genomic scale predictions. Conclusions: ESpritz predicts three flavors of disorder at two distinct false positive rates, either with a fast or slower and slightly more accurate approach. Given its state-of-the-art performance, it can be especially useful for high-throughput applications. Availability: Both a web server for high-throughput analysis and a Linux executable version of ESpritz are available from: http://protein.bio.unipd.it/espritz/ Contact: silvio.tosatto@unipd.it Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Fast large-scale clustering of protein structures using Gauss integrals (2012)

Harder, T., Borg, M., Boomsma, W., Rogen, P., Hamelryck, T.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: Clustering protein structures is an important task in structural bioinformatics. De novo structure prediction, for example, often involves a clustering step for finding the best prediction. Other applications include assigning proteins to fold families and analyzing molecular dynamics trajectories. Results: We present Pleiades, a novel approach to clustering protein structures with a rigorous mathematical underpinning. The method approximates clustering based on the root mean square deviation by first mapping structures to Gauss integral vectors—which were introduced by Røgen and co-workers—and subsequently performing K-means clustering. Conclusions: Compared to current methods, Pleiades dramatically improves on the time needed to perform clustering, and can cluster a significantly larger number of structures, while providing state-of-the-art results. The number of low energy structures generated in a typical folding study, which is in the order of 50 000 structures, can be clustered within seconds to minutes. Contact: thamelry@binf.ku.dk ; harder@binf.ku.dk Supplementary Information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

BOCTOPUS: improved topology prediction of transmembrane {beta} barrel proteins (2012)

Hayat, S., Elofsson, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: Transmembrane β barrel proteins (TMBs) are found in the outer membrane of Gram-negative bacteria, chloroplast and mitochondria. They play a major role in the translocation machinery, pore formation, membrane anchoring and ion exchange. TMBs are also promising targets for antimicrobial drugs and vaccines. Given the difficulty in membrane protein structure determination, computational methods to identify TMBs and predict the topology of TMBs are important. Results: Here, we present BOCTOPUS; an improved method for the topology prediction of TMBs by employing a combination of support vector machines (SVMs) and Hidden Markov Models (HMMs). The SVMs and HMMs account for local and global residue preferences, respectively. Based on a 10-fold cross-validation test, BOCTOPUS performs better than all existing methods, reaching a Q3 accuracy of 87%. Further, BOCTOPUS predicted the correct number of strands for 83% proteins in the dataset. BOCTOPUS might also help in reliable identification of TMBs by using it as an additional filter to methods specialized in this task. Availability: BOCTOPUS is freely available as a web server at: http://boctopus.cbr.su.se/ . The datasets used for training and evaluations are also available from this site. Contact: arne@bioinfo.se Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Improved mean estimation and its application to diagonal discriminant analysis (2012)

Tong, T., Chen, L., Zhao, H.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: High-dimensional data such as microarrays have created new challenges to traditional statistical methods. One such example is on class prediction with high-dimension, low-sample size data. Due to the small sample size, the sample mean estimates are usually unreliable. As a consequence, the performance of the class prediction methods using the sample mean may also be unsatisfactory. To obtain more accurate estimation of parameters some statistical methods, such as regularizations through shrinkage, are often desired. Results: In this article, we investigate the family of shrinkage estimators for the mean value under the quadratic loss function. The optimal shrinkage parameter is proposed under the scenario when the sample size is fixed and the dimension is large. We then construct a shrinkage-based diagonal discriminant rule by replacing the sample mean by the proposed shrinkage mean. Finally, we demonstrate via simulation studies and real data analysis that the proposed shrinkage-based rule outperforms its original competitor in a wide range of settings. Contact: tongt@hkbu.edu.hk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Read count approach for DNA copy number variants detection (2012)

Magi, A., Tattini, L., Pippucci, T., Torricelli, F., Benelli, M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: The advent of high-throughput sequencing technologies is revolutionizing our ability in discovering and genotyping DNA copy number variants (CNVs). Read count-based approaches are able to detect CNV regions with an unprecedented resolution. Although this computational strategy has been recently introduced in literature, much work has been already done for the preparation, normalization and analysis of this kind of data. Results: Here we face the many aspects that cover the detection of CNVs by using read count approach. We first study the characteristics and systematic biases of read count distributions, focusing on the normalization methods designed for removing these biases. Subsequently, we compare the algorithms designed to detect the boundaries of CNVs and we investigate the ability of read count data to predict the exact number of DNA copy. Finally, we review the tools publicly available for analysing read count data. To better understand the state of the art of read count approaches, we compare the performance of the three most widely used sequencing technologies (Illumina Genome Analyzer, Roche 454 and Life Technologies SOLiD) in all the analyses that we perform. Contact: albertomagi@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Quantifying the white blood cell transcriptome as an accessible window to the multiorgan transcriptome (2012)

Kohane, I. S., Valtchinov, V. I.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: We investigate and quantify the generalizability of the white blood cell (WBC) transcriptome to the general, multiorgan transcriptome. We use data from the NCBI's Gene Expression Omnibus (GEO) public repository to define two datasets for comparison, WBC and OO (Other Organ) sets. Results: Comprehensive pair-wise correlation and expression level profiles are calculated for both datasets (with sizes of 81 and 1463, respectively). We have used mapping and ranking across the Gene Ontology (GO) categories to quantify similarity between the two sets. GO mappings of the most correlated and highly expressed genes from the two datasets tightly match, with the notable exceptions of components of the ribosome, cell adhesion and immune response. That is, 10 877 or 48.8% of all measured genes do not change 〉10% of rank range between WBC and OO; only 878 (3.9%) change rank 〉50%. Two trans -tissue gene lists are defined, the most changing and the least changing genes in expression rank. We also provide a general, quantitative measure of the probability of expression rank and correlation profile in the OO system given the expression rank and correlation profile in the WBC dataset. Contact: vvaltchinov@partners.org Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Modeling mechanistic biological networks: An advanced Boolean approach (2012)

Handorf, T., Klipp, E.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: The understanding of the molecular sources for diseases like cancer can be significantly improved by computational models. Recently, Boolean networks have become very popular for modeling signaling and regulatory networks. However, such models rely on a set of Boolean functions that are in general not known. Unfortunately, while detailed information on the molecular interactions becomes available in large scale through electronic databases, the information on the Boolean functions does not become available simultaneously and has to be included manually into the models, if at all known. Results: We propose a new Boolean approach which can directly utilize the mechanistic network information available through modern databases. The Boolean function is implicitly defined by the reaction mechanisms. Special care has been taken for the treatment of kinetic features like inhibition. The method has been applied to a signaling model combining the Wnt and MAPK pathway. Availability: A sample C++ implementation of the proposed method is available for Linux and compatible systems through http://code.google.com/p/libscopes/wiki/Paper2011 Contact: handorf@physik.hu-berlin.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

11

Unknown

Measuring the distance between multiple sequence alignments (2012)

Blackburne, B. P., Whelan, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: Multiple sequence alignment (MSA) is a core method in bioinformatics. The accuracy of such alignments may influence the success of downstream analyses such as phylogenetic inference, protein structure prediction, and functional prediction. The importance of MSA has lead to the proliferation of MSA methods, with different objective functions and heuristics to search for the optimal MSA. Different methods of inferring MSAs produce different results in all but the most trivial cases. By measuring the differences between inferred alignments, we may be able to develop an understanding of how these differences (i) relate to the objective functions and heuristics used in MSA methods, and (ii) affect downstream analyses. Results: We introduce four metrics to compare MSAs, which include the position in a sequence where a gap occurs or the location on a phylogenetic tree where an insertion or deletion (indel) event occurs. We use both real and synthetic data to explore the information given by these metrics and demonstrate how the different metrics in combination can yield more information about MSA methods and the differences between them. Availability: MetAl is a free software implementation of these metrics in Haskell. Source and binaries for Windows, Linux and Mac OS X are available from http://kumiho.smith.man.ac.uk/whelan/software/metal/ . Contact: simon.whelan@manchester.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

12

Unknown

BPDA2d--a 2D global optimization-based Bayesian peptide detection algorithm for liquid chromatograph-mass spectrometry (2012)

Sun, Y., Zhang, J., Braga-Neto, U., Dougherty, E. R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: Peptide detection is a crucial step in mass spectrometry (MS) based proteomics. Most existing algorithms are based upon greedy isotope template matching and thus may be prone to error propagation and ineffective to detect overlapping peptides. In addition, existing algorithms usually work at different charge states separately, isolating useful information that can be drawn from other charge states, which may lead to poor detection of low abundance peptides. Results: BPDA2d models spectra as a mixture of candidate peptide signals and systematically evaluates all possible combinations of possible peptide candidates to interpret the given spectra. For each candidate, BPDA2d takes into account its elution profile, charge state distribution and isotope pattern, and it combines all evidence to infer the candidate's signal and existence probability. By piecing all evidence together—especially by deriving information across charge states—low abundance peptides can be better identified and peptide detection rates can be improved. Instead of local template matching, BPDA2d performs global optimization for all candidates and systematically optimizes their signals. Since BPDA2d looks for the optimal among all possible interpretations of the given spectra, it has the capability in handling complex spectra where features overlap. BPDA2d estimates the posterior existence probability of detected peptides, which can be directly used for probability-based evaluation in subsequent processing steps. Our experiments indicate that BPDA2d outperforms state-of-the-art detection methods on both simulated data and real liquid chromatography–mass spectrometry data, according to sensitivity and detection accuracy. Availability: The BPDA2d software package is available at http://gsp.tamu.edu/Publications/supplementary/sun11a/ Contact: Michelle.Zhang@utsa.edu ; edward@ece.tamu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

13

Unknown

Robust rank aggregation for gene list integration and meta-analysis (2012)

Kolde, R., Laur, S., Adler, P., Vilo, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: The continued progress in developing technological platforms, availability of many published experimental datasets, as well as different statistical methods to analyze those data have allowed approaching the same research question using various methods simultaneously. To get the best out of all these alternatives, we need to integrate their results in an unbiased manner. Prioritized gene lists are a common result presentation method in genomic data analysis applications. Thus, the rank aggregation methods can become a useful and general solution for the integration task. Results: Standard rank aggregation methods are often ill-suited for biological settings where the gene lists are inherently noisy. As a remedy, we propose a novel robust rank aggregation (RRA) method. Our method detects genes that are ranked consistently better than expected under null hypothesis of uncorrelated inputs and assigns a significance score for each gene. The underlying probabilistic model makes the algorithm parameter free and robust to outliers, noise and errors. Significance scores also provide a rigorous way to keep only the statistically relevant genes in the final list. These properties make our approach robust and compelling for many settings. Availability: All the methods are implemented as a GNU R package R obust R ank A ggreg , freely available at the Comprehensive R Archive Network http://cran.r-project.org/ . Contact: vilo@ut.ee Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

14

Unknown

CLARE: Cracking the LAnguage of Regulatory Elements (2012)

Taher, L., Narlikar, L., Ovcharenko, I.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: : CLARE is a computational method designed to reveal sequence encryption of tissue-specific regulatory elements. Starting with a set of regulatory elements known to be active in a particular tissue/process, it learns the sequence code of the input set and builds a predictive model from features specific to those elements. The resulting model can then be applied to user-supplied genomic regions to identify novel candidate regulatory elements. CLARE's model also provides a detailed analysis of transcription factors that most likely bind to the elements, making it an invaluable tool for understanding mechanisms of tissue-specific gene regulation. Availability: CLARE is freely accessible at http://clare.dcode.org/ . Contact: taherl@ncbi.nlm.nih.gov ; ovcharen@nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

15

Unknown

MeQA: a pipeline for MeDIP-seq data quality assessment and analysis (2012)

Huang, J., Renault, V., Sengenes, J., Touleimat, N., Michel, S., Lathrop, M., Tost, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: We present a pipeline for the pre-processing, quality assessment, read distribution and methylation estimation for methylated DNA immunoprecipitation (MeDIP)-sequence datasets. This is the first MeDIP-seq-specific analytic pipeline that starts at the output of the sequencers. This pipeline will reduce the data analysis load on staff and allows the easy and straightforward analysis of sequencing data for DNA methylation. The pipeline integrates customized scripting and several existing tools, which can deal with both paired and single end data. Availability: The package and extensive documentation, and comparison to public data is available at http://life.tongji.edu.cn/meqa/ Contact: jhuang@cephb.fr

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

16

Unknown

Optimal structural inference of signaling pathways from unordered and overlapping gene sets (2012)

Acharya, L. R., Judeh, T., Wang, G., Zhu, D.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: A plethora of bioinformatics analysis has led to the discovery of numerous gene sets, which can be interpreted as discrete measurements emitted from latent signaling pathways. Their potential to infer signaling pathway structures, however, has not been sufficiently exploited. Existing methods accommodating discrete data do not explicitly consider signal cascading mechanisms that characterize a signaling pathway. Novel computational methods are thus needed to fully utilize gene sets and broaden the scope from focusing only on pairwise interactions to the more general cascading events in the inference of signaling pathway structures. Results: We propose a gene set based simulated annealing (SA) algorithm for the reconstruction of signaling pathway structures. A signaling pathway structure is a directed graph containing up to a few hundred nodes and many overlapping signal cascades, where each cascade represents a chain of molecular interactions from the cell surface to the nucleus. Gene sets in our context refer to discrete sets of genes participating in signal cascades, the basic building blocks of a signaling pathway, with no prior information about gene orderings in the cascades. From a compendium of gene sets related to a pathway, SA aims to search for signal cascades that characterize the optimal signaling pathway structure. In the search process, the extent of overlap among signal cascades is used to measure the optimality of a structure. Throughout, we treat gene sets as random samples from a first-order Markov chain model. We evaluated the performance of SA in three case studies. In the first study conducted on 83 KEGG pathways, SA demonstrated a significantly better performance than Bayesian network methods. Since both SA and Bayesian network methods accommodate discrete data, use a ‘search and score’ network learning strategy and output a directed network, they can be compared in terms of performance and computational time. In the second study, we compared SA and Bayesian network methods using four benchmark datasets from DREAM. In our final study, we showcased two context-specific signaling pathways activated in breast cancer. Availibility: Source codes are available from http://dl.dropbox.com/u/16000775/sa_sc.zip Contact: dzhu@wayne.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

17

Unknown

htSeqTools: high-throughput sequencing quality control, processing and visualization in R (2012)

Planet, E., Attolini, C. S.-O., Reina, O., Flores, O., Rossell, D.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: : We provide a Bioconductor package with quality assessment, processing and visualization tools for high-throughput sequencing data, with emphasis in ChIP-seq and RNA-seq studies. It includes detection of outliers and biases, inefficient immuno-precipitation and overamplification artifacts, de novo identification of read-rich genomic regions and visualization of the location and coverage of genomic region lists. Availability: www.bioconductor.org Contact: david.rossell@irbbarcelona.org Supplementary information: Supplementary data available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

18

Unknown

Approximating the set of local minima in partial RNA folding landscapes (2012)

Sahoo, S., Albrecht, A. A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: We study a stochastic method for approximating the set of local minima in partial RNA folding landscapes associated with a bounded-distance neighbourhood of folding conformations. The conformations are limited to RNA secondary structures without pseudoknots. The method aims at exploring partial energy landscapes p L induced by folding simulations and their underlying neighbourhood relations. It combines an approximation of the number of local optima devised by Garnier and Kallel (2002) with a run-time estimation for identifying sets of local optima established by Reeves and Eremeev (2004). Results: The method is tested on nine sequences of length between 50 nt and 400 nt, which allows us to compare the results with data generated by RNAsubopt and subsequent barrier tree calculations. On the nine sequences, the method captures on average 92% of local minima with settings designed for a target of 95%. The run-time of the heuristic can be estimated by O ( n 2 D ln), where n is the sequence length, is the number of local minima in the partial landscape p L under consideration and D is the maximum number of steepest descent steps in attraction basins associated with p L . Contact: a.albrecht@qub.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

19

Unknown

PASSion: a pattern growth algorithm-based pipeline for splice junction detection in paired-end RNA-Seq data (2012)

Zhang, Y., Lameijer, E.-W., 't Hoen, P. A. C., Ning, Z., Slagboom, P. E., Ye, K.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: RNA-seq is a powerful technology for the study of transcriptome profiles that uses deep-sequencing technologies. Moreover, it may be used for cellular phenotyping and help establishing the etiology of diseases characterized by abnormal splicing patterns. In RNA-Seq, the exact nature of splicing events is buried in the reads that span exon–exon boundaries. The accurate and efficient mapping of these reads to the reference genome is a major challenge. Results: We developed PASSion, a pattern growth algorithm-based pipeline for splice site detection in paired-end RNA-Seq reads. Comparing the performance of PASSion to three existing RNA-Seq analysis pipelines, TopHat, MapSplice and HMMSplicer, revealed that PASSion is competitive with these packages. Moreover, the performance of PASSion is not affected by read length and coverage. It performs better than the other three approaches when detecting junctions in highly abundant transcripts. PASSion has the ability to detect junctions that do not have known splicing motifs, which cannot be found by the other tools. Of the two public RNA-Seq datasets, PASSion predicted ~ 137 000 and 173 000 splicing events, of which on average 82 are known junctions annotated in the Ensembl transcript database and 18% are novel. In addition, our package can discover differential and shared splicing patterns among multiple samples. Availability: The code and utilities can be freely downloaded from https://trac.nbic.nl/passion and ftp://ftp.sanger.ac.uk/pub/zn1/passion Contact: y.zhang@lumc.nl ; k.ye@lumc.nl Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

20

Unknown

PopDrowser: the Population Drosophila Browser (2012)

Ramia, M., Librado, P., Casillas, S., Rozas, J., Barbadilla, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: The completion of 168 genome sequences from a single population of Drosophila melanogaster provides a global view of genomic variation and an understanding of the evolutionary forces shaping the patterns of DNA polymorphism and divergence along the genome. Results: We present the ‘Population Drosophila Browser’ (PopDrowser), a new genome browser specially designed for the automatic analysis and representation of genetic variation across the D. melanogaster genome sequence. PopDrowser allows estimating and visualizing the values of a number of DNA polymorphism and divergence summary statistics, linkage disequilibrium parameters and several neutrality tests. PopDrowser also allows performing custom analyses on-the-fly using user-selected parameters. Availability: PopDrowser is freely available from http://PopDrowser.uab.cat . Contact: miquel.ramia@uab.cat

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

21

Unknown

MotEvo: integrated Bayesian probabilistic methods for inferring regulatory sites and motifs on multiple alignments of DNA sequences (2012)

Arnold, P., Erb, I., Pachkov, M., Molina, N., van Nimwegen, E.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: Motivation: Probabilistic approaches for inferring transcription factor binding sites (TFBSs) and regulatory motifs from DNA sequences have been developed for over two decades. Previous work has shown that prediction accuracy can be significantly improved by incorporating features such as the competition of multiple transcription factors (TFs) for binding to nearby sites, the tendency of TFBSs for co-regulated TFs to cluster and form cis -regulatory modules and explicit evolutionary modeling of conservation of TFBSs across orthologous sequences. However, currently available tools only incorporate some of these features, and significant methodological hurdles hampered their synthesis into a single consistent probabilistic framework. Results: We present MotEvo, a integrated suite of Bayesian probabilistic methods for the prediction of TFBSs and inference of regulatory motifs from multiple alignments of phylogenetically related DNA sequences, which incorporates all features just mentioned. In addition, MotEvo incorporates a novel model for detecting unknown functional elements that are under evolutionary constraint, and a new robust model for treating gain and loss of TFBSs along a phylogeny. Rigorous benchmarking tests on ChIP-seq datasets show that MotEvo's novel features significantly improve the accuracy of TFBS prediction, motif inference and enhancer prediction. Availability: Source code, a user manual and files with several example applications are available at www.swissregulon.unibas.ch . Contact: erik.vannimwegen@unibas.ch Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

22

Unknown

LaTcOm: a web server for visualizing rare codon clusters in coding sequences (2012)

Theodosiou, A., Promponas, V. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: : We present LaTcOm, a new web tool, which offers several alternative methods for ‘rare codon cluster’ (RCC) identification from a single and simple graphical user interface. In the current version, three RCC detection schemes are implemented: the recently described %MinMax algorithm and a simplified sliding window approach, along with a novel modification of a linear-time algorithm for the detection of maximally scoring subsequences tailored to the RCC detection problem. Among a number of user tunable parameters, several codon-based scales relevant for RCC detection are available, including tRNA abundance values from Escherichia coli and several codon usage tables from a selection of genomes. Furthermore, useful scale transformations may be performed upon user request (e.g. linear, sigmoid). Users may choose to visualize RCC positions within the submitted sequences either with graphical representations or in textual form for further processing. Availability: LaTcOm is freely available online at the URL http://troodos.biol.ucy.ac.cy/latcom.html . Contact: vprobon@ucy.ac.cy Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

23

Unknown

SNPdbe: constructing an nsSNP functional impacts database (2012)

Schaefer, C., Meier, A., Rost, B., Bromberg, Y.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: : Many existing databases annotate experimentally characterized single nucleotide polymorphisms (SNPs). Each non-synonymous SNP (nsSNP) changes one amino acid in the gene product (single amino acid substitution;SAAS). This change can either affect protein function or be neutral in that respect. Most polymorphisms lack experimental annotation of their functional impact. Here, we introduce SNPdbe—SNP database of effects, with predictions of computationally annotated functional impacts of SNPs. Database entries represent nsSNPs in dbSNP and 1000 Genomes collection, as well as variants from UniProt and PMD. SAASs come from 〉2600 organisms; ‘human’ being the most prevalent. The impact of each SAAS on protein function is predicted using the SNAP and SIFT algorithms and augmented with experimentally derived function/structure information and disease associations from PMD, OMIM and UniProt. SNPdbe is consistently updated and easily augmented with new sources of information. The database is available as an MySQL dump and via a web front end that allows searches with any combination of organism names, sequences and mutation IDs. Availability: http://www.rostlab.org/services/snpdbe Contact: schaefer@rostlab.org ; snpdbe@rostlab.org

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

24

Unknown

JDet: interactive calculation and visualization of function-related conservation patterns in multiple sequence alignments and structures (2012)

Muth, T., Garcia-Martin, J. A., Rausell, A., Juan, D., Valencia, A., Pazos, F.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: : We have implemented in a single package all the features required for extracting, visualizing and manipulating fully conserved positions as well as those with a family-dependent conservation pattern in multiple sequence alignments. The program allows, among other things, to run different methods for extracting these positions, combine the results and visualize them in protein 3D structures and sequence spaces. Availability and implementation: JDet is a multiplatform application written in Java. It is freely available, including the source code, at http://csbg.cnb.csic.es/JDet . The package includes two of our recently developed programs for detecting functional positions in protein alignments ( Xdet and S3Det ), and support for other methods can be added as plug-ins. A help file and a guided tutorial for JDet are also available. Contact: pazos@cnb.csic.es

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

25

Unknown

VarSifter: Visualizing and analyzing exome-scale sequence variation data on a desktop computer (2012)

Teer, J. K., Green, E. D., Mullikin, J. C., Biesecker, L. G.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-02-17

Description: : VarSifter is a graphical software tool for desktop computers that allows investigators of varying computational skills to easily and quickly sort, filter, and sift through sequence variation data. A variety of filters and a custom query framework allow filtering based on any combination of sample and annotation information. By simplifying visualization and analyses of exome-scale sequence variation data, this program will help bring the power and promise of massively-parallel DNA sequencing to a broader group of researchers. Availability and Implementation: VarSifter is written in Java, and is freely available in source and binary versions, along with a User Guide, at http://research.nhgri.nih.gov/software/VarSifter/ . Contact: mullikin@mail.nih.gov Supplementary Information: Additional figures and methods available online at the journal's website.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

26

Unknown

Genotype calling and phasing using next-generation sequencing reads and a haplotype scaffold (2012)

Menelaou, A., Marchini, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: Motivation: Given the current costs of next-generation sequencing, large studies carry out low-coverage sequencing followed by application of methods that leverage linkage disequilibrium to infer genotypes. We propose a novel method that assumes study samples are sequenced at low coverage and genotyped on a genome-wide microarray, as in the 1000 Genomes Project (1KGP). We assume polymorphic sites have been detected from the sequencing data and that genotype likelihoods are available at these sites. We also assume that the microarray genotypes have been phased to construct a haplotype scaffold. We then phase each polymorphic site using an MCMC algorithm that iteratively updates the unobserved alleles based on the genotype likelihoods at that site and local haplotype information. We use a multivariate normal model to capture both allele frequency and linkage disequilibrium information around each site. When sequencing data are available from trios, Mendelian transmission constraints are easily accommodated into the updates. The method is highly parallelizable, as it analyses one position at a time. Results: We illustrate the performance of the method compared with other methods using data from Phase 1 of the 1KGP in terms of genotype accuracy, phasing accuracy and downstream imputation performance. We show that the haplotype panel we infer in African samples, which was based on a trio-phased scaffold, increases downstream imputation accuracy for rare variants (R2 increases by 〉0.05 for minor allele frequency 〈1%), and this will translate into a boost in power to detect associations. These results highlight the value of incorporating microarray genotypes when calling variants from next-generation sequence data. Availability: The method (called MVNcall) is implemented in a C++ program and is available from http://www.stats.ox.ac.uk/~marchini/#software . Contact: marchini@stats.ox.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

27

Unknown

STAR: ultrafast universal RNA-seq aligner (2012)

Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., Batut, P., Chaisson, M., Gingeras, T. R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: Motivation: Accurate alignment of high-throughput RNA-seq data is a challenging and yet unsolved problem because of the non-contiguous transcript structure, relatively short read lengths and constantly increasing throughput of the sequencing technologies. Currently available RNA-seq aligners suffer from high mapping error rates, low mapping speed, read length limitation and mapping biases. Results: To align our large (〉80 billon reads) ENCODE Transcriptome RNA-seq dataset, we developed the Spliced Transcripts Alignment to a Reference (STAR) software based on a previously undescribed RNA-seq alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. STAR outperforms other aligners by a factor of 〉50 in mapping speed, aligning to the human genome 550 million 2 x 76 bp paired-end reads per hour on a modest 12-core server, while at the same time improving alignment sensitivity and precision. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. Using Roche 454 sequencing of reverse transcription polymerase chain reaction amplicons, we experimentally validated 1960 novel intergenic splice junctions with an 80–90% success rate, corroborating the high precision of the STAR mapping strategy. Availability and implementation: STAR is implemented as a standalone C++ code. STAR is free open source software distributed under GPLv3 license and can be downloaded from http://code.google.com/p/rna-star/ . Contact: dobin@cshl.edu .

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

28

Unknown

DLocalMotif: a discriminative approach for discovering local motifs in protein sequences (2012)

Mehdi, A. M., Sehgal, M. S. B., Kobe, B., Bailey, T. L., Boden, M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: Motivation: Local motifs are patterns of DNA or protein sequences that occur within a sequence interval relative to a biologically defined anchor or landmark. Current protein motif discovery methods do not adequately consider such constraints to identify biologically significant motifs that are only weakly over-represented but spatially confined. Using negatives, i.e. sequences known to not contain a local motif, can further increase the specificity of their discovery. Results: This article introduces the method DLocalMotif that makes use of positional information and negative data for local motif discovery in protein sequences. DLocalMotif combines three scoring functions, measuring degrees of motif over-representation, entropy and spatial confinement, specifically designed to discriminatively exploit the availability of negative data. The method is shown to outperform current methods that use only a subset of these motif characteristics. We apply the method to several biological datasets. The analysis of peroxisomal targeting signals uncovers several novel motifs that occur immediately upstream of the dominant peroxisomal targeting signal-1 signal. The analysis of proline-tyrosine nuclear localization signals uncovers multiple novel motifs that overlap with C2H2 zinc finger domains. We also evaluate the method on classical nuclear localization signals and endoplasmic reticulum retention signals and find that DLocalMotif successfully recovers biologically relevant sequence properties. Availability: http://bioinf.scmb.uq.edu.au/dlocalmotif/ Contact: m.boden@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

29

Unknown

Human protein-protein interaction prediction by a novel sequence-based co-evolution method: co-evolutionary divergence (2012)

Hsin Liu, C., Li, K.-C., Yuan, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: Motivation: Protein–protein interaction (PPI) plays an important role in understanding gene functions, and many computational PPI prediction methods have been proposed in recent years. Despite the extensive efforts, PPI prediction still has much room to improve. Sequence-based co-evolution methods include the substitution rate method and the mirror tree method, which compare sequence substitution rates and topological similarity of phylogenetic trees, respectively. Although they have been used to predict PPI in species with small genomes like Escherichia coli , such methods have not been tested in large scale proteome like Homo sapiens . Result: In this study, we propose a novel sequence-based co-evolution method, co-evolutionary divergence (CD), for human PPI prediction. Built on the basic assumption that protein pairs with similar substitution rates are likely to interact with each other, the CD method converts the evolutionary information from 14 species of vertebrates into likelihood ratios and combined them together to infer PPI. We showed that the CD method outperformed the mirror tree method in three independent human PPI datasets by a large margin. With the arrival of more species genome information generated by next generation sequencing, the performance of the CD method can be further improved. Availability: Source code and support are available at http://mib.stat.sinica.edu.tw/LAP/tmp/CD.rar . Contact: syuan@stat.sinica.edu.tw Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

30

Unknown

Dragon TIS Spotter: an Arabidopsis-derived predictor of translation initiation sites in plants (2012)

Magana-Mora, A., Ashoor, H., Jankovic, B. R., Kamau, A., Awara, K., Chowdhary, R., Archer, J. A. C., Bajic, V. B.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: : In higher eukaryotes, the identification of translation initiation sites (TISs) has been focused on finding these signals in cDNA or mRNA sequences. Using Arabidopsis thaliana ( A.t. ) information, we developed a prediction tool for signals within genomic sequences of plants that correspond to TISs. Our tool requires only genome sequence, not expressed sequences. Its sensitivity/specificity is for A.t. (90.75%/92.2%), for Vitis vinifera (66.8%/94.4%) and for Populus trichocarpa (81.6%/94.4%), which suggests that our tool can be used in annotation of different plant genomes. We provide a list of features used in our model. Further study of these features may improve our understanding of mechanisms of the translation initiation. Availability and implementation: Our tool is implemented as an artificial neural network. It is available as a web-based tool and, together with the source code, the list of features, and data used for model development, is accessible at http://cbrc.kaust.edu.sa/dts . Contact: vladimir.bajic@kaust.edu.sa Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

31

Unknown

Pathway hunting by random survival forests (2012)

Chen, X., Ishwaran, H.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: Motivation: Pathway or gene set analysis has been widely applied to genomic data. Many current pathway testing methods use univariate test statistics calculated from individual genomic markers, which ignores the correlations and interactions between candidate markers. Random forests-based pathway analysis is a promising approach for incorporating complex correlation and interaction patterns, but one limitation of previous approaches is that pathways have been considered separately, thus pathway cross-talk information was not considered. Results: In this article, we develop a new pathway hunting algorithm for survival outcomes using random survival forests, which prioritize important pathways by accounting for gene correlation and genomic interactions. We show that the proposed method performs favourably compared with five popular pathway testing methods using both synthetic and real data. We find that the proposed methodology provides an efficient and powerful pathway modelling framework for high-dimensional genomic data. Availability: The R code for the analysis used in this article is available upon request. Contact: xi.steven.chen@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

32

Unknown

PBSIM: PacBio reads simulator--toward accurate genome assembly (2012)

Ono, Y., Asai, K., Hamada, M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: Motivation: PacBio sequencers produce two types of characteristic reads (continuous long reads: long and high error rate and circular consensus sequencing: short and low error rate), both of which could be useful for de novo assembly of genomes. Currently, there is no available simulator that targets the specific generation of PacBio libraries. Results: Our analysis of 13 PacBio datasets showed characteristic features of PacBio reads (e.g. the read length of PacBio reads follows a log-normal distribution). We have developed a read simulator, PBSIM, that captures these features using either a model-based or sampling-based method. Using PBSIM, we conducted several hybrid error correction and assembly tests for PacBio reads, suggesting that a continuous long reads coverage depth of at least 15 in combination with a circular consensus sequencing coverage depth of at least 30 achieved extensive assembly results. Availability: PBSIM is freely available from the web under the GNU GPL v2 license ( http://code.google.com/p/pbsim/ ). Contact: mhamada@k.u-tokyo.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

33

Unknown

Introducing Drugster: a comprehensive and fully integrated drug design, lead and structure optimization toolkit (2012)

Vlachakis, D., Tsagrasoulis, D., Megalooikonomou, V., Kossida, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: : Drugster is a fully interactive pipeline designed to break the command line barrier and introduce a new user-friendly environment to perform drug design, lead and structure optimization experiments through an efficient combination of the PDB2PQR, Ligbuilder, Gromacs and Dock suites. Our platform features a novel workflow that guides the user through each logical step of the iterative 3D structural optimization setup and drug design process, by providing a seamless interface to all incorporated packages. Availability: Drugster can be freely downloaded via our dedicated server system at http://www.bioacademy.gr/bioinformatics/drugster/ . Contact: dvlachakis@bioacademy.gr .

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

34

Unknown

XiP: a computational environment to create, extend and share workflows (2012)

Nagasaki, M., Fujita, A., Sekiya, Y., Saito, A., Ikeda, E., Li, C., Miyano, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: XiP (eXtensible integrative Pipeline) is a flexible, editable and modular environment with a user-friendly interface that does not require previous advanced programming skills to run, construct and edit workflows. XiP allows the construction of workflows by linking components written in both R and Java, the analysis of high-throughput data in grid engine systems and also the development of customized pipelines that can be encapsulated in a package and distributed. XiP already comes with several ready-to-use pipeline flows for the most common genomic and transcriptomic analysis and ~300 computational components. Availability: XiP is open source, freely available under the Lesser General Public License (LGPL) and can be downloaded from http://xip.hgc.jp . Contact: nagasaki@megabank.tohoku.ac.jp

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

35

Unknown

ADAM: automated data management for research datasets (2012)

Woodbridge, M., Tomlinson, C. D., Butcher, S. A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: Existing repositories for experimental datasets typically capture snapshots of data acquired using a single experimental technique and often require manual population and continual curation. We present a storage system for heterogeneous research data that performs dynamic automated indexing to provide powerful search, discovery and collaboration features without the restrictions of a structured repository. ADAM is able to index many commonly used file formats generated by laboratory assays and therefore offers specific advantages to the experimental biology community. However, it is not domain specific and can promote sharing and re-use of working data across scientific disciplines. Availability and implementation: ADAM is implemented using Java and supported on Linux. It is open source under the GNU General Public License v3.0. Installation instructions, binary code, a demo system and virtual machine image and are available at http://www.imperial.ac.uk/bioinfsupport/resources/software/adam . Contact: m.woodbridge@imperial.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

36

Unknown

DvD: An R/Cytoscape pipeline for drug repurposing using public repositories of gene expression data (2012)

Pacini, C., Iorio, F., Goncalves, E., Iskar, M., Klabunde, T., Bork, P., Saez-Rodriguez, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-21

Description: : Drug versus Disease (DvD) provides a pipeline, available through R or Cytoscape, for the comparison of drug and disease gene expression profiles from public microarray repositories. Negatively correlated profiles can be used to generate hypotheses of drug-repurposing, whereas positively correlated profiles may be used to infer side effects of drugs. DvD allows users to compare drug and disease signatures with dynamic access to databases Array Express, Gene Expression Omnibus and data from the Connectivity Map. Availability and implementation: R package (submitted to Bioconductor) under GPL 3 and Cytoscape plug-in freely available for download at www.ebi.ac.uk/saezrodriguez/DVD/ . Contact: saezrodriguez@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

37

Unknown

The vasohibin family: a novel family for angiogenesis regulation (2012)

Sato, Y.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-12-22

Description: Angiogenesis, a formation of neovessels, is regulated by the local balance between angiogenesis stimulators and inhibitors. A number of such endogenous regulators of angiogenesis have been found in the body. Recently, vasohibin-1 (VASH1) was isolated as a negative feedback regulator of angiogenesis produced by endothelial cells (ECs) and subsequently vasohibin-2 (VASH2) as a homologue of VASH1. It was then explored that VASH1 is expressed in ECs to terminate angiogenesis, whereas VASH2 is expressed in cells other than ECs to promote angiogenesis in the mouse model of angiogenesis. This review will focus on the vasohibin family members, which are novel regulators of angiogenesis.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

38

Unknown

Activation of the Wnt/{beta}-catenin pathway and tissue inhibitor of metalloprotease 1 during tertiary dentinogenesis (2012)

Yoshioka, S., Takahashi, Y., Abe, M., Michikami, I., Imazato, S., Wakisaka, S., Hayashi, M., Ebisu, S.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-12-22

Description: Tertiary dentin is deposited inside teeth after various stimuli and serves as a major defensive wall to preserve pulp cells. However, the molecular mechanisms of the activation of quiescent odontoblasts, immature pulp cells and tertiary dentin formation are still unclear. Therefore, we performed a comprehensive gene expression analysis of pulp cells after cavity preparation of 9-week-old rat molars to clarify the critical molecules in tertiary dentinogenesis. As a result, mRNA expression of various molecules was up- or down-regulated. Notably, several members of the matrix metalloprotease family and their endogenous inhibitors were up-regulated after cavity preparation. In situ hybridization showed that tissue inhibitor of metalloprotease 1 ( Timp1 ) was widely and continuously distributed in the pulp beneath the cavity in vivo . We also observed accumulation of β-catenin in the pulp cells beneath the cavity by fluorescence immunohistochemistry. Furthermore, Timp1 transcription was repressed by a dominant-negative TCF4 in immature undifferentiated mesenchymal cells, but not altered in mature odontoblast-like cells. These results indicate that cavity preparation may activate the Wnt/β-catenin pathway and the Wnt/β-catenin pathway and Timp1 may be correlatively involved in pulp repair. Timp1 might play crucial roles in reactivation of immature pulp cells for tertiary dentinogenesis.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

39

Unknown

Phosphoinositide 5-phosphatases: how do they affect tumourigenesis? (2012)

Miyazawa, K.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-12-22

Description: The activity of biological molecules is often affected by their phosphorylation state. Regulatory phosphorylation operates as a binary switch and is usually controlled by counteracting kinases and phosphatases. However, phosphatidylinositol (PtdIns) has three phosphorylation sites on its inositol ring. The phosphorylation status of PtdIns is controlled by multiple kinases and phosphatases with distinct substrate specificities, serving as a ‘lipid code’ or ‘phosphoinositide code’. Class I phosphoinositide 3-kinase (PI3K) converts PtdIns(4,5)P 2 to PtdIns(3,4,5)P 3 , which plays a pivotal role in signals controlling glucose uptake, cytoskeletal reorganization, cell proliferation and apoptosis. PI3K is pro-oncogenic, whereas phosphoinositide phosphatases that degrade PtdIns(3,4,5)P 3 are not always anti-oncogenic. Recent studies have revealed the unique characteristics of phosphoinositide 5-phosphatases.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

40

Unknown

Vascular endothelial growth factor and its receptor system: physiological functions in angiogenesis and pathological roles in various diseases (2012)

Shibuya, M.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-12-22

Description: Vascular endothelial growth factors (VEGFs) belong to the platelet-derived growth factor supergene family, and they play central roles in the regulation of angiogenesis and lymphangiogenesis. VEGF-A, the major factor for angiogenesis, binds to two tyrosine kinase (TK) receptors, VEGFR-1 (Flt-1) and VEGFR-2 (KDR/Flk-1), and regulates endothelial cell proliferation, migration, vascular permeability, secretion and other endothelial functions. VEGFR-2 exhibits a strong TK activity towards pro-angiogenic signals, whereas the soluble VEGFR-1 (sFlt-1) functions as an endogenous VEGF inhibitor. sFlt-1 is abnormally overexpressed in the placenta of preeclampsia patients, resulting in the major symptoms of the disease due to abnormal trapping of VEGFs. The VEGF-VEGFR system is crucial for tumour angiogenesis, and anti-VEGF-VEGFR molecules are now widely used in the clinical field to treat cancer patients. The efficacy of these molecules in prolonging the overall survival of patients has been established; however, some cancers do not respond well and reduced tumour sensitivity to anti-VEGF signals may occur after long-term treatment. The molecular basis of tumour refractoriness should be determined to improve anti-angiogenic therapy.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

41

Unknown

Accurate determination of tissue steroid hormones, precursors and conjugates in adult male rat (2012)

Maeda, N., Tanaka, E., Suzuki, T., Okumura, K., Nomura, S., Miyasho, T., Haeno, S., Yokota, H.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-12-22

Description: The actual levels of steroid hormones in organs are vital for endocrine, reproductive and neuronal health and disorders. We developed an accurate method to determine the levels of steroid hormones and steroid conjugates in various organs by an efficient preparation using a solid-phase-extraction cartridge. Each steroid was identified by the precursor ion spectra using liquid chromatography–electrospray ionization time-of-flight mass spectrometry, and the respective steroids were quantitatively analysed in the selected reaction monitoring mode by liquid chromatograph-mass spectrometry/mass spectrometry (LC-MS/MS). The data showed that significant levels of testosterone, corticosterone and precursors of both hormones were detected in all organs except liver. The glucuronide conjugates of steroid hormones and the precursors were detected in all organs except liver, but sulfate conjugates of these steroids were observed only in the target organs of the hormones and kidney. Interestingly, these steroids and the conjugates were not observed in the liver except pregnenolone. In conclusion, an accurate determination of tissue steroids was developed using LC-MS analysis. Biosynthesis of steroid hormones from the precursors was estimated even in the target organs, and the delivery of these steroid conjugates was also suggested via the circulation without any significant hepatic participation.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

42

Unknown

Enzymatic characterization of germination-specific cysteine protease-1 expressed transiently in cotyledons during the early phase of germination (2012)

Tsuji, A., Tsukamoto, K., Iwamoto, K., Ito, Y., Yuasa, K.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-12-22

Description: Papain-like cysteine protease activity that shows a unique transient expression profile in cotyledons of daikon radish during germination was detected. The enzyme showed a distinct elution pattern on DEAE-cellulose compared with cathepsin B-like and Responsive to dessication-21 cysteine protease. Although this activity was not detected in seed prior to imbibition, the activity increased markedly and reached a maximum at 2 days after imbibition and then decreased rapidly and completely disappeared after 5 days. Using cystatin-Sepharose, the 26 kDa cysteine protease (DRCP26) was isolated from cotyledons at 2 days after imbibition. The deduced amino acid sequence from the cDNA nucleotide sequence indicated that DRCP26 is an orthologue of Arabidopsis unidentified protein, germination-specific cysteine protease-1, belonging to the C1 family of cysteine protease predicted from genetic information. In an effort to characterize the enzymatic properties of DRCP26, the enzyme was purified to homogeneity from cotyledons at 48 h after imbibition. The best synthetic substrate for the enzyme was carbobenzoxy-Phe-Arg-4-methylcoumaryl-7-amide. All model peptides were digested to small peptides by the enzyme, suggesting that DRCP26 possesses broad cleavage specificity. These results indicated that DRCP26 plays a role in the mobilization of storage proteins in the early phase of seed germination.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

43

Unknown

Transcriptome assembly and isoform expression level estimation from biased RNA-Seq reads (2012)

Li, W., Jiang, T.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: RNA-Seq uses the high-throughput sequencing technology to identify and quantify transcriptome at an unprecedented high resolution and low cost. However, RNA-Seq reads are usually not uniformly distributed and biases in RNA-Seq data post great challenges in many applications including transcriptome assembly and the expression level estimation of genes or isoforms. Much effort has been made in the literature to calibrate the expression level estimation from biased RNA-Seq data, but the effect of biases on transcriptome assembly remains largely unexplored. Results: Here, we propose a statistical framework for both transcriptome assembly and isoform expression level estimation from biased RNA-Seq data. Using a quasi-multinomial distribution model, our method is able to capture various types of RNA-Seq biases, including positional, sequencing and mappability biases. Our experimental results on simulated and real RNA-Seq datasets exhibit interesting effects of RNA-Seq biases on both transcriptome assembly and isoform expression level estimation. The advantage of our method is clearly shown in the experimental analysis by its high sensitivity and precision in transcriptome assembly and the high concordance of its estimated expression levels with quantitative reverse transcription–polymerase chain reaction data. Availability: CEM is freely available at http://www.cs.ucr.edu/~liw/cem.html . Contact: liw@cs.ucr.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

44

Unknown

Post-translational modifications induce significant yet not extreme changes to protein structure (2012)

Xin, F., Radivojac, P.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: A number of studies of individual proteins have shown that post-translational modifications (PTMs) are associated with structural rearrangements of their target proteins. Although such studies provide critical insights into the mechanics behind the dynamic regulation of protein function, they usually feature examples with relatively large conformational changes. However, with the steady growth of Protein Data Bank (PDB) and available PTM sites, it is now possible to more systematically characterize the role of PTMs as conformational switches. In this study, we ask (1) what is the expected extent of structural change upon PTM, (2) how often are those changes in fact substantial, (3) whether the structural impact is spatially localized or global and (4) whether different PTMs have different signatures. Results: We exploit redundancy in PDB and, using root-mean-square deviation, study the conformational heterogeneity of groups of protein structures corresponding to identical sequences in their unmodified and modified forms. We primarily focus on the two most abundant PTMs in PDB, glycosylation and phosphorylation, but show that acetylation and methylation have similar tendencies. Our results provide evidence that PTMs induce conformational changes at both local and global level. However, the proportion of large changes is unexpectedly small; only 7% of glycosylated and 13% of phosphorylated proteins undergo global changes 〉2 Å. Further analysis suggests that phosphorylation stabilizes protein structure by reducing global conformational heterogeneity by 25%. Overall, these results suggest a subtle but common role of allostery in the mechanisms through which PTMs affect regulatory and signaling pathways. Contact: predrag@indiana.edu Supplementary Information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

45

Unknown

VIPR HMM: a hidden Markov model for detecting recombination with microbial detection microarrays (2012)

Allred, A. F., Renshaw, H., Weaver, S., Tesh, R. B., Wang, D.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: Current methods in diagnostic microbiology typically focus on the detection of a single genomic locus or protein in a candidate agent. The presence of the entire microbe is then inferred from this isolated result. Problematically, the presence of recombination in microbial genomes would go undetected unless other genomic loci or protein components were specifically assayed. Microarrays lend themselves well to the detection of multiple loci from a given microbe; furthermore, the inherent nature of microarrays facilitates highly parallel interrogation of multiple microbes. However, none of the existing methods for analyzing diagnostic microarray data has the capacity to specifically identify recombinant microbes. In previous work, we developed a novel algorithm, VIPR, for analyzing diagnostic microarray data. Results: We have expanded upon our previous implementation of VIPR by incorporating a hidden Markov model (HMM) to detect recombinant genomes. We trained our HMM on a set of non-recombinant parental viruses and applied our method to 11 recombinant alphaviruses and 4 recombinant flaviviruses hybridized to a diagnostic microarray in order to evaluate performance of the HMM. VIPR HMM correctly identified 95% of the 62 inter-species recombination breakpoints in the validation set and only two false-positive breakpoints were predicted. This study represents the first description and validation of an algorithm capable of detecting recombinant viruses based on diagnostic microarray hybridization patterns. Availability: VIPR HMM is freely available for academic use and can be downloaded from http://ibridgenetwork.org/wustl/vipr . Contact: davewang@borcim.wustl.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

46

Unknown

COPE: an accurate k-mer-based pair-end reads connection tool to facilitate genome assembly (2012)

Liu, B., Yuan, J., Yiu, S.-M., Li, Z., Xie, Y., Chen, Y., Shi, Y., Zhang, H., Li, Y., Lam, T.-W., Luo, R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: The boost of next-generation sequencing technologies provides us with an unprecedented opportunity for elucidating genetic mysteries, yet the short-read length hinders us from better assembling the genome from scratch. New protocols now exist that can generate overlapping pair-end reads. By joining the 3' ends of each read pair, one is able to construct longer reads for assembling. However, effectively joining two overlapped pair-end reads remains a challenging task. Result: In this article, we present an efficient tool called Connecting Overlapped Pair-End (COPE) reads, to connect overlapping pair-end reads using k -mer frequencies. We evaluated our tool on 30 x simulated pair-end reads from Arabidopsis thaliana with 1% base error. COPE connected over 99% of reads with 98.8% accuracy, which is, respectively, 10 and 2% higher than the recently published tool FLASH. When COPE is applied to real reads for genome assembly, the resulting contigs are found to have fewer errors and give a 14-fold improvement in the N50 measurement when compared with the contigs produced using unconnected reads. Availability and implementation: COPE is implemented in C++ and is freely available as open-source code at ftp://ftp.genomics.org.cn/pub/cope . Contact: twlam@cs.hku.hk or luoruibang@genomics.org.cn

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

47

Unknown

A novel missense-mutation-related feature extraction scheme for 'driver' mutation identification (2012)

Tan, H., Bao, J., Zhou, X.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: It becomes widely accepted that human cancer is a disease involving dynamic changes in the genome and that the missense mutations constitute the bulk of human genetic variations. A multitude of computational algorithms, especially the machine learning-based ones, has consequently been proposed to distinguish missense changes that contribute to the cancer progression (‘driver’ mutation) from those that do not (‘passenger’ mutation). However, the existing methods have multifaceted shortcomings, in the sense that they either adopt incomplete feature space or depend on protein structural databases which are usually far from integrated. Results: In this article, we investigated multiple aspects of a missense mutation and identified a novel feature space that well distinguishes cancer-associated driver mutations from passenger ones. An index (DX score) was proposed to evaluate the discriminating capability of each feature, and a subset of these features which ranks top was selected to build the SVM classifier. Cross-validation showed that the classifier trained on our selected features significantly outperforms the existing ones both in precision and robustness. We applied our method to several datasets of missense mutations culled from published database and literature and obtained more reasonable results than previous studies. Availability : The software is available online at http://www.methodisthealth.com/software and https://sites.google.com/site/drivermutationidentification/ . Contact : xzhou@tmhs.org Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

48

Unknown

A linear programming model for protein inference problem in shotgun proteomics (2012)

Huang, T., He, Z.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: Assembling peptides identified from tandem mass spectra into a list of proteins, referred to as protein inference, is an important issue in shotgun proteomics. The objective of protein inference is to find a subset of proteins that are truly present in the sample. Although many methods have been proposed for protein inference, several issues such as peptide degeneracy still remain unsolved. Results: In this article, we present a linear programming model for protein inference. In this model, we use a transformation of the joint probability that each peptide/protein pair is present in the sample as the variable. Then, both the peptide probability and protein probability can be expressed as a formula in terms of the linear combination of these variables. Based on this simple fact, the protein inference problem is formulated as an optimization problem: minimize the number of proteins with non-zero probabilities under the constraint that the difference between the calculated peptide probability and the peptide probability generated from peptide identification algorithms should be less than some threshold. This model addresses the peptide degeneracy issue by forcing some joint probability variables involving degenerate peptides to be zero in a rigorous manner. The corresponding inference algorithm is named as ProteinLP. We test the performance of ProteinLP on six datasets. Experimental results show that our method is competitive with the state-of-the-art protein inference algorithms. Availability: The source code of our algorithm is available at: https://sourceforge.net/projects/prolp/ . Contact: zyhe@dlut.edu.cn Supplementary information: Supplementary data are available at Bioinformatics Online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

49

Unknown

MeDIP-HMM: genome-wide identification of distinct DNA methylation states from high-density tiling arrays (2012)

Seifert, M., Cortijo, S., Colome-Tatche, M., Johannes, F., Roudier, F., Colot, V.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: Methylation of cytosines in DNA is an important epigenetic mechanism involved in transcriptional regulation and preservation of genome integrity in a wide range of eukaryotes. Immunoprecipitation of methylated DNA followed by hybridization to genomic tiling arrays (MeDIP-chip) is a cost-effective and sensitive method for methylome analyses. However, existing bioinformatics methods only enable a binary classification into unmethylated and methylated genomic regions, which limit biological interpretations. Indeed, DNA methylation levels can vary substantially within a given DNA fragment depending on the number and degree of methylated cytosines. Therefore, a method for the identification of more than two methylation states is highly desirable. Results: Here, we present a three-state hidden Markov model (MeDIP-HMM) for analyzing MeDIP-chip data. MeDIP-HMM uses a higher-order state-transition process improving modeling of spatial dependencies between chromosomal regions, allows a simultaneous analysis of replicates and enables a differentiation between unmethylated, methylated and highly methylated genomic regions. We train MeDIP-HMM using a Bayesian Baum–Welch algorithm, integrating prior knowledge on methylation levels. We apply MeDIP-HMM to the analysis of the Arabidopsis root methylome and systematically investigate the benefit of using higher-order HMMs. Moreover, we also perform an in-depth comparison study with existing methods and demonstrate the value of using MeDIP-HMM by comparisons to current knowledge on the Arabidopsis methylome. We find that MeDIP-HMM is a fast and precise method for the analysis of methylome data, enabling the identification of distinct DNA methylation levels. Finally, we provide evidence for the general applicability of MeDIP-HMM by analyzing promoter DNA methylation data obtained for chicken. Availability: MeDIP-HMM is available as part of the open-source Java library Jstacs ( www.jstacs.de/index.php/MeDIP-HMM ). Data files are available from the Jstacs website. Contact: seifert@ipk-gatersleben.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

50

Unknown

Application and evaluation of automated methods to extract neuroanatomical connectivity statements from free text (2012)

French, L., Lane, S., Xu, L., Siu, C., Kwok, C., Chen, Y., Krebs, C., Pavlidis, P.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: Automated annotation of neuroanatomical connectivity statements from the neuroscience literature would enable accessible and large-scale connectivity resources. Unfortunately, the connectivity findings are not formally encoded and occur as natural language text. This hinders aggregation, indexing, searching and integration of the reports. We annotated a set of 1377 abstracts for connectivity relations to facilitate automated extraction of connectivity relationships from neuroscience literature. We tested several baseline measures based on co-occurrence and lexical rules. We compare results from seven machine learning methods adapted from the protein interaction extraction domain that employ part-of-speech, dependency and syntax features. Results: Co-occurrence based methods provided high recall with weak precision. The shallow linguistic kernel recalled 70.1% of the sentence-level connectivity statements at 50.3% precision. Owing to its speed and simplicity, we applied the shallow linguistic kernel to a large set of new abstracts. To evaluate the results, we compared 2688 extracted connections with the Brain Architecture Management System (an existing database of rat connectivity). The extracted connections were connected in the Brain Architecture Management System at a rate of 63.5%, compared with 51.1% for co-occurring brain region pairs. We found that precision increases with the recency and frequency of the extracted relationships. Availability and implementation: The source code, evaluations, documentation and other supplementary materials are available at http://www.chibi.ubc.ca/WhiteText . Contact: paul@chibi.ubc.ca Supplementary information: Supplementary data are available at Bioinformatics Online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

51

Unknown

Efficient methods for identifying mutated driver pathways in cancer (2012)

Zhao, J., Zhang, S., Wu, L.-Y., Zhang, X.-S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: The first step for clinical diagnostics, prognostics and targeted therapeutics of cancer is to comprehensively understand its molecular mechanisms. Large-scale cancer genomics projects are providing a large volume of data about genomic, epigenomic and gene expression aberrations in multiple cancer types. One of the remaining challenges is to identify driver mutations, driver genes and driver pathways promoting cancer proliferation and filter out the unfunctional and passenger ones. Results: In this study, we propose two methods to solve the so-called maximum weight submatrix problem, which is designed to de novo identify mutated driver pathways from mutation data in cancer. The first one is an exact method that can be helpful for assessing other approximate or/and heuristic algorithms. The second one is a stochastic and flexible method that can be employed to incorporate other types of information to improve the first method. Particularly, we propose an integrative model to combine mutation and expression data. We first apply our methods onto simulated data to show their efficiency. We further apply the proposed methods onto several real biological datasets, such as the mutation profiles of 74 head and neck squamous cell carcinomas samples, 90 glioblastoma tumor samples and 313 ovarian carcinoma samples. The gene expression profiles were also considered for the later two data. The results show that our integrative model can identify more biologically relevant gene sets. We have implemented all these methods and made a package called mutated driver pathway finder, which can be easily used for other researchers. Availability: A MATLAB package of MDPFinder is available at http://zhangroup.aporc.org/ShiHuaZhang Contact: zsh@amss.ac.cn Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

52

Unknown

Biologistics--Diffusion coefficients for complete proteome of Escherichia coli (2012)

Kalwarczyk, T., Tabaka, M., Holyst, R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: Biologistics provides data for quantitative analysis of transport (diffusion) processes and their spatio-temporal correlations in cells. Mobility of proteins is one of the few parameters necessary to describe reaction rates for gene regulation. Although understanding of diffusion-limited biochemical reactions in vivo requires mobility data for the largest possible number of proteins in their native forms, currently, there is no database that would contain the complete information about the diffusion coefficients (DCs) of proteins in a given cell type. Results: We demonstrate a method for the determination of in vivo DCs for any molecule—regardless of its molecular weight, size and structure—in any type of cell. We exemplify the method with the database of in vivo DC for all proteins (4302 records) from the proteome of K12 strain of Escherichia coli , together with examples of DC of amino acids, sugars, RNA and DNA. The database follows from the scale-dependent viscosity reference curve (sdVRC). Construction of sdVRC for prokaryotic or eukaryotic cell requires ~20 in vivo measurements using techniques such as fluorescence correlation spectroscopy (FCS), fluorescence recovery after photobleaching (FRAP), nuclear magnetic resonance (NMR) or particle tracking. The shape of the sdVRC would be different for each organism, but the mathematical form of the curve remains the same. The presented method has a high predictive power, as the measurements of DCs of several inert, properly chosen probes in a single cell type allows to determine the DCs of thousands of proteins. Additionally, obtained mobility data allow quantitative study of biochemical interactions in vivo . Contact: rholyst@ichf.edu.pl Supplementary information: Supplementary data are available at Bioinformatics Online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

53

Unknown

Mendel-GPU: haplotyping and genotype imputation on graphics processing units (2012)

Chen, G. K., Wang, K., Stram, A. H., Sobel, E. M., Lange, K.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: In modern sequencing studies, one can improve the confidence of genotype calls by phasing haplotypes using information from an external reference panel of fully typed unrelated individuals. However, the computational demands are so high that they prohibit researchers with limited computational resources from haplotyping large-scale sequence data. Results: Our graphics processing unit based software delivers haplotyping and imputation accuracies comparable to competing programs at a fraction of the computational cost and peak memory demand. Availability: Mendel-GPU , our OpenCL software, runs on Linux platforms and is portable across AMD and nVidia GPUs. Users can download both code and documentation at http://code.google.com/p/mendel-gpu/ . Contact: gary.k.chen@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

54

Unknown

Phylogenetics, likelihood, evolution and complexity (2012)

de Koning, A. P. J., Gu, W., Castoe, T. A., Pollock, D. D.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : Phylogenetics, likelihood, evolution and complexity ( PLEX ) is a flexible and fast Bayesian Markov chain Monte Carlo software program for large-scale analysis of nucleotide and amino acid data using complex evolutionary models in a phylogenetic framework. The program gains large speed improvements over standard approaches by implementing ‘partial sampling of substitution histories’, a data augmentation approach that can reduce data analysis times from months to minutes on large comparative datasets. A variety of nucleotide and amino acid substitution models are currently implemented, including non-reversible and site-heterogeneous mixture models. Due to efficient algorithms that scale well with data size and model complexity, PLEX can be used to make inferences from hundreds to thousands of taxa in only minutes on a desktop computer. It also performs probabilistic ancestral sequence reconstruction. Future versions will support detection of co-evolutionary interactions between sites, probabilistic tests of convergent evolution and rigorous testing of evolutionary hypotheses in a Bayesian framework. Availability and implementation: PLEX v1.0 is licensed under GPL. Source code and documentation will be available for download at www.evolutionarygenomics.com/ProgramsData/PLEX . PLEX is implemented in C++ and supported on Linux, Mac OS X and other platforms supporting standard C++ compilers. Example data, control files, documentation and accessory Perl scripts are available from the website. Contact: David.Pollock@UCDenver.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

55

Unknown

DLRS: gene tree evolution in light of a species tree (2012)

Sjostrand, J., Sennblad, B., Arvestad, L., Lagergren, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : PrIME-DLRS (or colloquially: ‘Delirious’) is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters. Availability and implementation : PrIME-DLRS is available for Java SE 6+ under the New BSD License, and JAR files and source code can be downloaded from http://code.google.com/p/jprime/ . There is also a slightly older C++ version available as a binary package for Ubuntu, with download instructions at http://prime.sbc.su.se . The C++ source code is available upon request. Contact: joel.sjostrand@scilifelab.se or jens.lagergren@scilifelab.se . Supplementary Information : PrIME-DLRS is based on a sound probabilistic model (Åkerborg et al. , 2009) and has been thoroughly validated on synthetic and biological datasets ( Supplementary Material online ).

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

56

Unknown

CSB: a Python framework for structural bioinformatics (2012)

Kalev, I., Mechelke, M., Kopec, K. O., Holder, T., Carstens, S., Habeck, M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : Computational Structural Biology Toolbox (CSB) is a cross-platform Python class library for reading, storing and analyzing biomolecular structures with rich support for statistical analyses. CSB is designed for reusability and extensibility and comes with a clean, well-documented API following good object-oriented engineering practice. Availability: Stable release packages are available for download from the Python Package Index (PyPI) as well as from the project’s website http://csb.codeplex.com . Contacts: ivan.kalev@gmail.com or michael.habeck@tuebingen.mpg.de

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

57

Unknown

GREVE: Genomic Recurrent Event ViEwer to assist the identification of patterns across individual cancer samples (2012)

Cazier, J.-B., Holmes, C. C., Broxholme, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : GREVE has been developed to assist with the identification of recurrent genomic aberrations across cancer samples. The exact characterization of such aberrations remains a challenge despite the availability of increasing amount of data, from SNParray to next-generation sequencing. Furthermore, genomic aberrations in cancer are especially difficult to handle because they are, by nature, unique to the patients. However, their recurrence in specific regions of the genome has been shown to reflect their relevance in the development of tumors. GREVE makes use of previously characterized events to identify such regions and focus any further analysis. Availability: GREVE is available through a web interface and open-source application ( http://www.well.ox.ac.uk/GREVE ).

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

58

Unknown

SAPIN: A framework for the structural analysis of protein interaction networks (2012)

Yang, J.-S., Campagna, A., Delgado, J., Vanhee, P., Serrano, L., Kiel, C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : Protein interaction networks are widely used to depict the relationships between proteins. These networks often lack the information on physical binary interactions, and they do not inform whether there is incompatibility of structure between binding partners. Here, we introduce SAPIN, a framework dedicated to the structural analysis of protein interaction networks. SAPIN first identifies the protein parts that could be involved in the interaction and provides template structures. Next, SAPIN performs structural superimpositions to identify compatible and mutually exclusive interactions. Finally, the results are displayed using Cytoscape Web. Availability: The SAPIN server is available at http://sapin.crg.es . Contact: jae-seong.yang@crg.eu or christina.kiel@crg.eu Supplementary information: Supplementary data are available at Bioinformatics Online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

59

Unknown

ChemBioServer: a web-based pipeline for filtering, clustering and visualization of chemical compounds used in drug discovery (2012)

Athanasiadis, E., Cournia, Z., Spyrou, G.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : ChemBioServer is a publicly available web application for effectively mining and filtering chemical compounds used in drug discovery. It provides researchers with the ability to (i) browse and visualize compounds along with their properties, (ii) filter chemical compounds for a variety of properties such as steric clashes and toxicity, (iii) apply perfect match substructure search, (iv) cluster compounds according to their physicochemical properties providing representative compounds for each cluster, (v) build custom compound mining pipelines and (vi) quantify through property graphs the top ranking compounds in drug discovery procedures. ChemBioServer allows for pre-processing of compounds prior to an in silico screen, as well as for post-processing of top-ranked molecules resulting from a docking exercise with the aim to increase the efficiency and the quality of compound selection that will pass to the experimental test phase. Availability: The ChemBioServer web application is available at: http://bioserver-3.bioacademy.gr/Bioserver/ChemBioServer/ . Contact: gspyrou@bioacademy.gr

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

60

Unknown

SGNS2: a compartmentalized stochastic chemical kinetics simulator for dynamic cell populations (2012)

Lloyd-Price, J., Gupta, A., Ribeiro, A. S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: Motivation: Cell growth and division affect the kinetics of internal cellular processes and the phenotype diversity of cell populations. Since the effects are complex, e.g. different cellular components are partitioned differently in cell division, to account for them in silico, one needs to simulate these processes in great detail. Results : We present SGNS2, a simulator of chemical reaction systems according to the Stochastic Simulation Algorithm with multi-delayed reactions within hierarchical, interlinked compartments which can be created, destroyed and divided at runtime. In division, molecules are randomly segregated into the daughter cells following a specified distribution corresponding to one of several partitioning schemes, applicable on a per-molecule-type basis. We exemplify its use with six models including a stochastic model of the disposal mechanism of unwanted protein aggregates in Escherichia coli , a model of phenotypic diversity in populations with different levels of synchrony, a model of a bacteriophage’s infection of a cell population and a model of prokaryotic gene expression at the nucleotide and codon levels. Availability : SGNS2, instructions and examples available at www.cs.tut.fi/~lloydpri/sgns2/ (open source under New BSD license). Contact : jason.lloyd-price@tut.fi Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

61

Unknown

VarB: a variation browsing and analysis tool for variants derived from next-generation sequencing data (2012)

Preston, M. D., Manske, M., Horner, N., Assefa, S., Campino, S., Auburn, S., Zongo, I., Ouedraogo, J.-B., Nosten, F., Anderson, T., Clark, T. G.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : There is an immediate need for tools to both analyse and visualize in real-time single-nucleotide polymorphisms, insertions and deletions, and other structural variants from new sequence file formats. We have developed VarB software that can be used to visualize variant call format files in real time, as well as identify regions under balancing selection and informative markers to differentiate user-defined groups (e.g. populations). We demonstrate its utility using sequence data from 50 Plasmodium falciparum isolates comprising two different continents and confirm known signals from genomic regions that contain important antigenic and anti-malarial drug-resistance genes. Availability and implementation: The C++-based software VarB and user manual are available from www.pathogenseq.org/varb . Contact: taane.clark@lshtm.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

62

Unknown

Comb-p: software for combining, analyzing, grouping and correcting spatially correlated P-values (2012)

Pedersen, B. S., Schwartz, D. A., Yang, I. V., Kechris, K. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : comb-p is a command-line tool and a python library that manipulates BED files of possibly irregularly spaced P -values and (1) calculates auto-correlation, (2) combines adjacent P -values, (3) performs false discovery adjustment, (4) finds regions of enrichment (i.e. series of adjacent low P -values) and (5) assigns significance to those regions. In addition, tools are provided for visualization and assessment. We provide validation and example uses on bisulfite-seq with P -values from Fisher’s exact test, tiled methylation probes using a linear model and Dam-ID for chromatin binding using moderated t -statistics. Because the library accepts input in a simple, standardized format and is unaffected by the origin of the P -values, it can be used for a wide variety of applications. Availability: comb-p is maintained under the BSD license. The documentation and implementation are available at https://github.com/brentp/combined-pvalues . Contact: bpederse@gmail.com

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

63

Unknown

ImgLib2--generic image processing in Java (2012)

Pietzsch, T., Preibisch, S., Tomancak, P., Saalfeld, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : ImgLib2 is an open-source Java library for n -dimensional data representation and manipulation with focus on image processing. It aims at minimizing code duplication by cleanly separating pixel-algebra, data access and data representation in memory. Algorithms can be implemented for classes of pixel types and generic access patterns by which they become independent of the specific dimensionality, pixel type and data representation. ImgLib2 illustrates that an elegant high-level programming interface can be achieved without sacrificing performance. It provides efficient implementations of common data types, storage layouts and algorithms. It is the data model underlying ImageJ2, the KNIME Image Processing toolbox and an increasing number of Fiji-Plugins. Availability : ImgLib2 is licensed under BSD. Documentation and source code are available at http://imglib2.net and in a public repository at https://github.com/imagej/imglib . Supplementary Information: Supplementary data are available at Bioinformatics Online. Contact : saalfeld@mpi-cbg.de

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

64

Unknown

An RNA Mapping DataBase for curating RNA structure mapping experiments (2012)

Cordero, P., Lucks, J. B., Das, R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : We have established an RNA mapping database (RMDB) to enable structural, thermodynamic and kinetic comparisons across single-nucleotide-resolution RNA structure mapping experiments. The volume of structure mapping data has greatly increased since the development of high-throughput sequencing techniques, accelerated software pipelines and large-scale mutagenesis. For scientists wishing to infer relationships between RNA sequence/structure and these mapping data, there is a need for a database that is curated, tagged with error estimates and interfaced with tools for sharing, visualization, search and meta-analysis. Through its on-line front-end, the RMDB allows users to explore single-nucleotide-resolution mapping data in heat-map, bar-graph and colored secondary structure graphics; to leverage these data to generate secondary structure hypotheses; and to download the data in standardized and computer-friendly files, including the RDAT and community-consensus SNRNASM formats. At the time of writing, the database houses 53 entries, describing more than 2848 experiments of 1098 RNA constructs in several solution conditions and is growing rapidly. Availability: Freely available on the web at http://rmdb.stanford.edu Contact: rhiju@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics Online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

65

Unknown

SpolPred: rapid and accurate prediction of Mycobacterium tuberculosis spoligotypes from short genomic sequences (2012)

Coll, F., Mallard, K., Preston, M. D., Bentley, S., Parkhill, J., McNerney, R., Martin, N., Clark, T. G.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : Spoligotyping is a well-established genotyping technique based on the presence of unique DNA sequences in Mycobacterium tuberculosis ( Mtb ), the causal agent of tuberculosis disease (TB). Although advances in sequencing technologies are leading to whole-genome bacterial characterization, tens of thousands of isolates have been spoligotyped, giving a global view of Mtb strain diversity. To bridge the gap, we have developed SpolPred , a software to predict the spoligotype from raw sequence reads. Our approach is compared with experimentally and de novo assembly determined strain types in a set of 44 Mtb isolates. In silico and experimental results are identical for almost all isolates (39/44). However, SpolPred detected five experimentally false spoligotypes and was more accurate and faster than the assembling strategy. Application of SpolPred to an additional seven isolates with no laboratory data led to types that clustered with identical experimental types in a phylogenetic analysis using single-nucleotide polymorphisms. Our results demonstrate the usefulness of the tool and its role in revealing experimental limitations. Availability and implementation : SpolPred is written in C and is available from www.pathogenseq.org/spolpred . Contact: francesc.coll@lshtm.ac.uk Supplementary information: Supplementary data are available at Bioinformatics Online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

66

Unknown

NetworkView: 3D display and analysis of protein{middle dot}RNA interaction networks (2012)

Eargle, J., Luthey-Schulten, Z.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-11-11

Description: : NetworkView is an application for the display and analysis of protein·RNA interaction networks derived from structure and/or dynamics. These networks typically model individual protein residues and nucleic acid monomers as nodes and their pairwise contacts as edges with associated weights. NetworkView projects the network onto the underlying 3D molecular structure so that visualization and analysis of the network can be coupled to physical and biological properties. NetworkView is implemented as a plugin to the molecular visualization software VMD. Availability and implementation : NetworkView is included with VMD, which is available at http://www.ks.uiuc.edu/Research/vmd/ . Documentation, tutorials and supporting programs are available at http://www.scs.illinois.edu/schulten/software/ . Contact : networkview@scs.illinois.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

67

Unknown

Distinction immune genes of hepatitis-induced heptatocellular carcinoma (2012)

Hu, J., Gao, D. Z.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Hepatitis B virus and hepatitis C virus are the two leading causes resulting in hepatocellular carcinoma (HCC). It is observed that hepatitis C virus (HCV) is relatively difficult to induce HCC compared with hepatitis B virus (HBV). This motivates us to reveal the reasons behind this from the viewpoint of immune genes. Results: To distinguish the immune genes with low-level expression in HBV-induced HCC, but high-level expression in HCV-induced HCC, the concept of distinction immune gene is proposed. A filter is then designed to screen these genes. By using gene positive network with strong correlations between genes, the genes are further filtered to form the set of key distinction immune genes. The 23 key distinction immune genes are screened, which are divided into four clusters, T cells, B cells, immune signalling and major histocompatibility complex. It is evident that the screened genes are important immune genes, which are activated in HCV-induced HCC, but inactivated in HBV-induced HCC. In HCV-induced HCC, the structures of HCV adaptively update, so that they are difficult to be identified by antigens. Therefore, the clinic advice is either to increase the update speed of antigens or reduce the update speed of the viruses during the treatment of HCV-induced HCC. Moreover, it is also advised to add T cells or add the expression levels of T cells to strengthen the ability to kill cancer cells. In contrast, HBV updates slowly, but the immunity system in HBV-induced HCC has been damaged seriously. As a result, the clinic advice is to improve the immune ability of patients subjected to HBV-induced HCC, such as increasing immunoglobulin, T cells and B cells and so forth. Contact: zhiwei.gao@northumbria.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

68

Unknown

Lengthening of 3'UTR increases with morphological complexity in animal evolution (2012)

Chen, C.-Y., Chen, S.-T., Juan, H.-F., Huang, H.-C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Evolutionary expansion of gene regulatory circuits seems to boost morphological complexity. However, the expansion patterns and the quantification relationships have not yet been identified. In this study, we focus on the regulatory circuits at the post-transcriptional level, investigating whether and how this principle may apply. Results: By analysing the structure of mRNA transcripts in multiple metazoan species, we observed a striking exponential correlation between the length of 3' untranslated regions (3'UTR) and morphological complexity as measured by the number of cell types in each organism. Cellular diversity was similarly associated with the accumulation of microRNA genes and their putative targets. We propose that the lengthening of 3'UTRs together with a commensurate exponential expansion in post-transcriptional regulatory circuits can contribute to the emergence of new cell types during animal evolution. Contact: yukijuan@ntu.edu.tw or hsuancheng@ym.edu.tw . Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

69

Unknown

De novo detection of copy number variation by co-assembly (2012)

Nijkamp, J. F., van den Broek, M. A., Geertman, J.-M. A., Reinders, M. J. T., Daran, J.-M. G., de Ridder, D.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Comparing genomes of individual organisms using next-generation sequencing data is, until now, mostly performed using a reference genome. This is challenging when the reference is distant and introduces bias towards the exact sequence present in the reference. Recent improvements in both sequencing read length and efficiency of assembly algorithms have brought direct comparison of individual genomes by de novo assembly, rather than through a reference genome, within reach. Results: Here, we develop and test an algorithm, named Magnolya, that uses a Poisson mixture model for copy number estimation of contigs assembled from sequencing data. We combine this with co-assembly to allow de novo detection of copy number variation (CNV) between two individual genomes, without mapping reads to a reference genome. In co-assembly, multiple sequencing samples are combined, generating a single contig graph with different traversal counts for the nodes and edges between the samples. In the resulting ‘coloured’ graph, the contigs have integer copy numbers; this negates the need to segment genomic regions based on depth of coverage, as required for mapping-based detection methods. Magnolya is then used to assign integer copy numbers to contigs, after which CNV probabilities are easily inferred. The copy number estimator and CNV detector perform well on simulated data. Application of the algorithms to hybrid yeast genomes showed allotriploid content from different origin in the wine yeast Y12, and extensive CNV in aneuploid brewing yeast genomes. Integer CNV was also accurately detected in a short-term laboratory-evolved yeast strain. Availability: Magnolya is implemented in Python and available at: http://bioinformatics.tudelft.nl/ Contact: d.deridder@tudelft.nl Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

70

Unknown

DAFS: simultaneous aligning and folding of RNA sequences via dual decomposition (2012)

Sato, K., Kato, Y., Akutsu, T., Asai, K., Sakakibara, Y.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: It is well known that the accuracy of RNA secondary structure prediction from a single sequence is limited, and thus a comparative approach that predicts a common secondary structure from aligned sequences is a better choice if homologous sequences with reliable alignments are available. However, correct secondary structure information is needed to produce reliable alignments of RNA sequences. To tackle this dilemma, we require a fast and accurate aligner that takes structural information into consideration to yield reliable structural alignments, which are suitable for common secondary structure prediction. Results: We develop DAFS , a novel algorithm that simultaneously aligns and folds RNA sequences based on maximizing expected accuracy of a predicted common secondary structure and its alignment. DAFS decomposes the pairwise structural alignment problem into two independent secondary structure prediction problems and one pairwise (non-structural) alignment problem by the dual decomposition technique, and maintains the consistency of a pairwise structural alignment by imposing penalties on inconsistent base pairs and alignment columns that are iteratively updated. Furthermore, we extend DAFS to consider pseudoknots in RNA structural alignments by integrating IPknot for predicting a pseudoknotted structure. The experiments on publicly available datasets showed that DAFS can produce reliable structural alignments from unaligned sequences in terms of accuracy of common secondary structure prediction. Availability: The program of DAFS and the datasets are available at http://www.ncrna.org/software/dafs/ . Contact: satoken@bio.keio.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

71

Unknown

Discriminative modelling of context-specific amino acid substitution probabilities (2012)

Angermuller, C., Biegert, A., Soding, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Protein sequence searching and alignment are fundamental tools of modern biology. Alignments are assessed using their similarity scores, essentially the sum of substitution matrix scores over all pairs of aligned amino acids. We previously proposed a generative probabilistic method that yields scores that take the sequence context around each aligned residue into account. This method showed drastically improved sensitivity and alignment quality compared with standard substitution matrix-based alignment. Results: Here, we develop an alternative discriminative approach to predict sequence context-specific substitution scores. We applied our approach to compute context-specific sequence profiles for Basic Local Alignment Search Tool (BLAST) and compared the new tool (CS-BLASTdis) to BLAST and the previous context-specific version (CS-BLASTgen). On a dataset filtered to 20% maximum sequence identity, CS-BLASTdisis was 51% more sensitive than BLAST and 17% more sensitive than CS-BLASTgenin, detecting remote homologues at 10% false discovery rate. At 30% maximum sequence identity, its alignments contain 21 and 12% more correct residue pairs than those of BLAST and CS-BLASTgen, respectively. Clear improvements are also seen when the approach is combined with PSI-BLAST and HHblits. We believe the context-specific approach should replace substitution matrices wherever sensitivity and alignment quality are critical. Availability: Source code (GNU General Public License, version 3) and benchmark data are available at ftp://toolkit.genzentrum.lmu.de/pub/csblast/ . Contact: soeding@genzentrum.lmu.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

72

Unknown

A counting renaissance: combining stochastic mapping and empirical Bayes to quickly detect amino acid sites under positive selection (2012)

Lemey, P., Minin, V. N., Bielejec, F., Kosakovsky Pond, S. L., Suchard, M. A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Statistical methods for comparing relative rates of synonymous and non - synonymous substitutions maintain a central role in detecting positive selection. To identify selection, researchers often estimate the ratio of these relative rates ( ) at individual alignment sites. Fitting a codon substitution model that captures heterogeneity in across sites provides a reliable way to perform such estimation, but it remains computationally prohibitive for massive datasets. By using crude estimates of the numbers of synonymous and non - synonymous substitutions at each site, counting approaches scale well to large datasets, but they fail to account for ancestral state reconstruction uncertainty and to provide site-specific estimates. Results: We propose a hybrid solution that borrows the computational strength of counting methods, but augments these methods with empirical Bayes modeling to produce a relatively fast and reliable method capable of estimating site-specific values in large datasets. Importantly, our hybrid approach, set in a Bayesian framework, integrates over the posterior distribution of phylogenies and ancestral reconstructions to quantify uncertainty about site-specific estimates. Simulations demonstrate that this method competes well with more - principled statistical procedures and , in some cases , even outperforms them. We illustrate the utility of our method using human immunodeficiency virus, feline panleukopenia and canine parvovirus evolution examples. Availability: Renaissance counting is implemented in the development branch of BEAST, freely available at http://code.google.com/p/beast-mcmc/ . The method will be made available in the next public release of the package, including support to set up analyses in BEAUti. Contact: philippe.lemey@rega.kuleuven.be or msuchard@ucla.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

73

Unknown

Reference-independent comparative metagenomics using cross-assembly: crAss (2012)

Dutilh, B. E., Schmieder, R., Nulton, J., Felts, B., Salamon, P., Edwards, R. A., Mokili, J. L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known microorganisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic analysis by creating a cross-assembly of all reads, i.e. a single assembly of reads from different samples. Comparative metagenomics studies the interrelationships between metagenomes from different samples. Using an assembly algorithm is a fast and intuitive way to link (partially) homologous reads without requiring a database of reference sequences. Results: Here, we introduce crAss, a novel bioinformatic tool that enables fast simple analysis of cross-assembly files, yielding distances between all metagenomic sample pairs and an insightful image displaying the similarities. Availability and implementation: crAss is available as a web server at http://edwards.sdsu.edu/crass/ , and the Perl source code can be downloaded to run as a stand-alone command line tool. Contact: dutilh@cmbi.ru.nl Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

74

Unknown

Discovering chimeric transcripts in paired-end RNA-seq data by using EricScript (2012)

Benelli, M., Pescucci, C., Marseglia, G., Severgnini, M., Torricelli, F., Magi, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: The discovery of novel gene fusions can lead to a better comprehension of cancer progression and development. The emergence of deep sequencing of trancriptome, known as RNA-seq, has opened many opportunities for the identification of this class of genomic alterations, leading to the discovery of novel chimeric transcripts in melanomas, breast cancers and lymphomas. Nowadays, few computational approaches have been developed for the detection of chimeric transcripts. Although all of these computational methods show good sensitivity, much work remains to reduce the huge number of false-positive calls that arises from this analysis. Results: We proposed a novel computational framework, named chimEric tranScript detection algorithm (EricScript), for the identification of gene fusion products in paired-end RNA-seq data. Our simulation study on synthetic data demonstrates that EricScript enables to achieve higher sensitivity and specificity than existing methods with noticeably lower running times. We also applied our method to publicly available RNA-seq tumour datasets, and we showed its capability in rediscovering known gene fusions. Availability: The EricScript package is freely available under GPL v3 license at http://ericscript.sourceforge.net . Contact: matteo.benelli@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

75

Unknown

Kirk, P., Griffin, J. E., Savage, R. S., Ghahramani, Z., Wild, D. L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct—but often complementary—information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. Results: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI’s performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation–chip and protein–protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques—as well as to non-integrative approaches—demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods. Availability: A Matlab implementation of MDI is available from http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/ . Contact: D.L.Wild@warwick.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

76

Unknown

An ensemble correlation-based gene selection algorithm for cancer classification with gene expression data (2012)

Piao, Y., Piao, M., Park, K., Ryu, K. H.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Gene selection for cancer classification is one of the most important topics in the biomedical field. However, microarray data pose a severe challenge for computational techniques. We need dimension reduction techniques that identify a small set of genes to achieve better learning performance. From the perspective of machine learning, the selection of genes can be considered to be a feature selection problem that aims to find a small subset of features that has the most discriminative information for the target. Results: In this article, we proposed an Ensemble Correlation-Based Gene Selection algorithm based on symmetrical uncertainty and Support Vector Machine. In our method, symmetrical uncertainty was used to analyze the relevance of the genes, the different starting points of the relevant subset were used to generate the gene subsets and the Support Vector Machine was used as an evaluation criterion of the wrapper. The efficiency and effectiveness of our method were demonstrated through comparisons with other feature selection techniques, and the results show that our method outperformed other methods published in the literature. Availability: By request from the author. Contact: pyz@dblab.chungbuk.ac.kr ; khryu@dblab.cbnu.ac.kr

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

77

Unknown

Olorin: combining gene flow with exome sequencing in large family studies of complex disease (2012)

Morris, J. A., Barrett, J. C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: The existence of families with many individuals affected by the same complex disease has long suggested the possibility of rare alleles of high penetrance. In contrast to Mendelian diseases, however, linkage studies have identified very few reproducibly linked loci in diseases such as diabetes and autism. Genome-wide association studies have had greater success with such diseases, but these results explain neither the extreme disease load nor the within-family linkage peaks, of some large pedigrees. Combining linkage information with exome or genome sequencing from large complex disease pedigrees might finally identify family-specific, high-penetrance mutations. Results: Olorin is a tool , which integrates gene flow within families with next generation sequencing data to enable the analysis of complex disease pedigrees. Users can interactively filter and prioritize variants based on haplotype sharing across selected individuals and other measures of importance, including predicted functional consequence and population frequency. Availability: http://www.sanger.ac.uk/resources/software/olorin Contact: olorin@sanger.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

78

Unknown

A method for integrative structure determination of protein-protein complexes (2012)

Schneidman-Duhovny, D., Rossi, A., Avila-Sakar, A., Kim, S. J., Velazquez-Muriel, J., Strop, P., Liang, H., Krukenberg, K. A., Liao, M., Kim, H. M., Sobhanifar, S., Dotsch, V., Rajpal, A., Pons, J., Agard, D. A., Cheng, Y., Sali, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: Motivation: Structural characterization of protein interactions is necessary for understanding and modulating biological processes. On one hand, X-ray crystallography or NMR spectroscopy provide atomic resolution structures but the data collection process is typically long and the success rate is low. On the other hand, computational methods for modeling assembly structures from individual components frequently suffer from high false-positive rate, rarely resulting in a unique solution. Results: Here, we present a combined approach that computationally integrates data from a variety of fast and accessible experimental techniques for rapid and accurate structure determination of protein–protein complexes. The integrative method uses atomistic models of two interacting proteins and one or more datasets from five accessible experimental techniques: a small-angle X-ray scattering (SAXS) profile, 2D class average images from negative-stain electron microscopy micrographs (EM), a 3D density map from single-particle negative-stain EM, residue type content of the protein–protein interface from NMR spectroscopy and chemical cross-linking detected by mass spectrometry. The method is tested on a docking benchmark consisting of 176 known complex structures and simulated experimental data. The near-native model is the top scoring one for up to 61% of benchmark cases depending on the included experimental datasets; in comparison to 10% for standard computational docking. We also collected SAXS, 2D class average images and 3D density map from negative-stain EM to model the PCSK9 antigen–J16 Fab antibody complex, followed by validation of the model by a subsequently available X-ray crystallographic structure. Availability: http://salilab.org/idock Contact: dina@salilab.org or sali@salilab.org Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

79

Unknown

Interactive exploration of RNA22 microRNA target predictions (2012)

Loher, P., Rigoutsos, I.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-12-08

Description: : MicroRNA (miRNA) target prediction is an important problem. Given an miRNA sequence the task is to determine the identity of the messenger RNAs targeted by it, the locations within them where the interactions happen and the specifics of the formed heteroduplexes. Here, we describe a web-based application, RNA22-GUI, which we have designed and implemented for the interactive exploration and in-context visualization of predictions by RNA22, one of the popular miRNA target prediction algorithms. Central to our design has been the requirement to provide informative and comprehensive visualization that is integrated with interactive search capabilities and permits one to selectively isolate and focus on relevant information that is distilled on-the-fly from a large repository of pre-compiled predictions. RNA22-GUI is currently available for Homo sapiens , Mus musculus , Drosophila melanogaster and Caenorhabditis elegans . Availability: http://cm.jefferson.edu/rna22v1.0/ . Contact: Isidore.Rigoutsos@jefferson.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

80

Unknown

Single-molecule imaging with a tagged ribosome to explore trans-translation (2012)

Imataka, H.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Single-molecule imaging is a powerful technique to visualize molecular interactions and movements. Translation is one of the most interesting targets for researchers with the molecular-imaging skills, since mRNA, tRNA and translation factors interact with or move inside or on the ribosome in an ordered manner. Trans -translation is a bacterial quality control system to rescue the ribosomes stalled at the 3' end of the mRNA, and this phenomenon is recapitulated in vitro with defined factors including two trans -translation-specific entities tmRNA and SmpB. Zhou et al. (Single molecule imaging of the trans -translation entry process via anchoring of the tagged ribosome. J Biochem 2011;149:609-618.) successfully visualized the interaction of the tmRNA–SmpB complex with the ribosome by immobilizing the ribosome on the quartz surface with the HaloTag technology. This ribosome-anchoring system may be useful for the imaging analysis of other processes of translation.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

81

Unknown

The functional roles of S1P in immunity (2012)

Hisano, Y., Nishi, T., Kawahara, A.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: The lipid mediator sphingosine-1-phosphate (S1P) is generated within cells from sphingosine by two sphingosine kinases (SPHK1 and SPHK2). Intracellularly synthesized S1P is released into the extracellular fluid by S1P transporters, including SPNS2. Released S1P binds specifically to the G protein-coupled S1P receptors (S1PR1/S1P 1 –S1PR5/S1P 5 ), which activate a diverse range of downstream signalling pathways. Recent studies have proposed that one of the central physiological functions of intercellular S1P signalling is in lymphocyte trafficking in vivo because genetic disruption of SPHK1/2, SPNS2 or S1PR1/S1P 1 in mice induces a lymphopenia phenotype. In this review, we discuss the current understanding of intercellular S1P signalling in the context of immunity.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

82

Unknown

Cdc6: a trifunctional AAA+ ATPase that plays a central role in controlling the G1-S transition and cell survival (2012)

Okayama, H.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Cdc6 is the AAA+ ATPase that assembles prereplicative complexes on replication origins in eukaryotic chromosomes. Recently, the same Cdc6 protein was found to exert two more functions in mammalian cells to promote cell proliferation and survival: ATP-dependent activation of p21 CIP1 - or p27 KIP1 -bound Cdk2-cyclin A/E complexes and obstruction of apoptosome assembly and consequent cell death by forming stable complexes with activated Apaf-1 molecules. These findings not only redefined the biological role of mammalian Cdc6 but also led the discovery of entirely new mechanisms controlling Cdk2 activity and apoptosis. This review focuses on this multi-functional AAA+ ATPase and the newly discovered mechanisms by which it controls the G 1 –S transition and cell survival during proliferation.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

83

Unknown

Pseudomonas putida PydR, a RutR-like transcriptional regulator, represses the dihydropyrimidine dehydrogenase gene in the pyrimidine reductive catabolic pathway (2012)

Hidese, R., Mihara, H., Kurihara, T., Esaki, N.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: The pyrimidine reductive catabolic pathway is important for the utilization of uracil and thymine as sources of nitrogen and carbon. The pathway is controlled by three enzymes: dihydropyrimidine dehydrogenase (DPD), dihydropyrimidinase and β-alanine synthase. The putative DPD genes, pydX and pydA , are tandemly arranged in the Pseudomonas putida genome. Intriguingly, a putative transcriptional regulator, PydR, homologous to Escherichia coli RutR, a repressor of the Rut-dependent pyrimidine degradation pathway, is located downstream of pydX and pydA . In this study, we show that a pydA strain of P. putida fails to grow on a minimal media containing uracil or thymine as a sole nitrogen source, demonstrating the physiological importance of DPD in the reductive pathway. The expression of pydA and DPD activity in the absence of uracil were significantly higher in a pydR strain than in the wild-type strain, indicating that PydR acts as a repressor of the pyrimidine reductive pathway in P. putida . Phylogenetic analysis of RutR and PydR suggests that these homologous repressors may have evolved from a common ancestral protein that regulates pyrimidine degradation.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

84

Unknown

Mediator lipidomics in acute inflammation and resolution (2012)

Arita, M.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Acute inflammation is an indispensable host response to foreign challenges or tissue injury. In healthy conditions, inflammatory processes are self-limiting and self-resolving, suggesting the existence of endogenous mechanisms for the control of inflammation and resolution. A comprehensive understanding of the cellular and molecular events of a well-orchestrated inflammatory response is required, and recent studies have uncovered the roles of endogenous lipid mediators derived from polyunsaturated fatty acids (i.e. lipoxins, resolvins, protectins) in controlling the resolution of inflammation. This review presents recent advances in understanding the formation and action of these mediators, especially focusing on the LC-MS/MS-based lipidomics approach and the emerging roles of eosinophils and eosinophil-derived lipid mediators in controlling acute inflammation and resolution.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

85

Unknown

Matrix control of transforming growth factor-{beta} function (2012)

Horiguchi, M., Ota, M., Rifkin, D. B.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: The cytokine transforming growth factor-beta (TGF-β) has multiple effects in both physiological and pathological conditions. TGF-β is secreted as part of a tripartite complex from which it must be released in order to bind to its receptor. Sequestration of latent TGF-β in the extracellular matrix (ECM) is crucial for proper mobilization of the latent cytokine and its activation. However, contrary to expectation, loss-of-function mutations in genes encoding certain matrix proteins that bind TGF-β yield elevated, rather than decreased, TGF-β levels, posing a ‘TGF-β paradox.’ In this review, we discuss recent findings concerning the relationship of TGF-β, ECM molecules, and latent TGF-β activation and propose a model to resolve the ‘TGF-β paradox.’

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

86

Unknown

Identification of the catalytic residues of sequence-specific and histidine-free ribonuclease colicin E5 (2012)

Inoue-Ito, S., Yajima, S., Fushinobu, S., Nakamura, S., Ogawa, T., Hidaka, M., Masaki, H.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Colicin E5 cleaves tRNAs for Tyr, His, Asn and Asp in their anticodons to abolish protein synthesis in Escherichia coli . We previously showed how its C-terminal RNase domain, E5-CRD, recognizes the anticodon bases but the catalytic mechanism remained to be elucidated. Although the reaction products with 5'-OH and 2',3'-cyclic phosphate ends suggested a similar mechanism to those of RNases A and T1, E5-CRD does not have the His residues necessary as a catalyst in usual RNases. To identify residues important for the catalytic reaction, mutants as to all residues within 5 Å from the central phosphorus of the scissile phosphodiester bond were prepared. Evaluation of the killing activities of the mutant colicins and the RNase activities of the mutant E5-CRDs suggested direct involvement of Arg33, Lys25, Gln29 and Lys60 in the reaction. Particularly, Arg33 plays a critical role and Ile94 provides a structural support of Arg33. Crystal structure of the complex of E5-CRD(R33Q)/dGpdUp showed structural and binding functional integrity of this mutant protein, suggesting involvement of Arg33 in the catalytic reaction. The structure of the free E5–CRD, we also determined, showed great flexibility of a flap region, which facilitates the access of Lys60 to the substrate in an induced-fit manner.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

87

Unknown

Sulfatide negatively regulates the fusion process of human parainfluenza virus type 3 (2012)

Takahashi, T., Ito, K., Fukushima, K., Takaguchi, M., Hayakawa, T., Suzuki, Y., Suzuki, T.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Sulfatide (HSO 3 -3-galactosylceramide), which enriched in lipid rafts of plasma membranes in various epithelial cell lines, is a critical component of host cells for effective production of influenza A virus. However, the function of sulfatide in other virus infections targeting epithelial cells remains unknown. In this study, the effect of sulfatide on infection of human parainfluenza virus type 3 (hPIV3) was demonstrated by using genetically produced sulfatide-enriched cells and by treatment of hPIV3-infected cells with anti-sulfatide monoclonal antibody (GS-5) as well as by addition of sulfatide to the cells. hPIV3 was found to bind to sulfatide in a virus overlay assay and a solid-phase binding assay. Genetic expression of sulfatide in COS-7 cells defective in sulfatide suppressed initial hPIV3 infection and formation of multinucleate virus-infected cells. Treatment of virus-infected LLC-MK2 cells with GS-5 promoted formation of multinucleate cells. In contrast, exogenous addition of sulfatide to hPIV3-infected COS-7 cells and cells expressing the hPIV3- hemagglutinin-neuraminidase ( HN ) gene and fusion ( F ) gene conspicuously reduced the formation of multinucleate cells. The results suggest that sulfatide negatively regulates the fusion process of hPIV3, possibly through interaction with HN or F glycoprotein on the cell surface.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

88

Unknown

Annexin A3 as a negative regulator of adipocyte differentiation (2012)

Watanabe, T., Ito, Y., Sato, A., Hosono, T., Niimi, S., Ariga, T., Seki, T.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Annexin A3 is a protein belonging to the annexin family, and it is mainly present in cellular membranes as a phospholipid-binding protein that binds via the calcium ion. However, its physiological function remains to be clarified. We examined the expression of annexin A3 in mouse tissues and found for the first time that annexin A3 mRNA and its protein were expressed more strongly in adipose tissues than in other tissues. In adipose tissues, annexin A3-expressing cells were present in the stromal vascular fraction, and precisely identical to Pref-1-positive preadipocytes, Pref-1 being an epidermal growth factor repeat-containing transmembrane protein that inhibits adipogenesis. In 3T3-L1 cells, used as a model of adipogenesis, annexin A3 was down-regulated at an early phase of adipocyte differentiation, and this pattern paralleled that of Pref-1. Suppression of annexin A3 in these cells with siRNA caused elevation of the PPAR2 mRNA level and lipid droplet accumulation. In conclusion, our data suggest that annexin A3 is a negative regulator of adipocyte differentiation.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

89

Unknown

PKL01, an Ndr kinase homologue in plant, shows tyrosine kinase activity (2012)

Katayama, S., Sugiyama, Y., Hatano, N., Terachi, T., Sueyoshi, N., Kameshita, I.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Protein phosphorylation by protein tyrosine (Tyr) kinases plays important roles in a variety of signalling pathways in cell growth, differentiation and oncogenesis in animals. Despite the absence of classical Tyr kinases in plants, a similar ratio of phosphotyrosine residues in phosphorylated proteins was found in Arabidopsis thaliana as in human. However, protein kinases responsible for tyrosine phosphorylation in plants except some dedicated dual-specificity kinases still remain unclear. In this study, we found that PKL01, a nuclear Dbf2-related (Ndr) kinase homologue in Lotus japonicus , was autophosphorylated at a tyrosine residue when it was expressed in Escherichia coli , but kinase-dead mutant of PKL01 was not. Tyrosine phophorylation site in PKL01 was identified as Tyr-56 by LC-MS/MS analysis. Recombinant PKL01, which had been dephosphorylated by an alkaline phosphatase, could be phosphorylated again at the Tyr residue when it was incubated with ATP. Furthermore, other Ndr kinases in plants and PKL01 phosphorylated on Tyr residues in the exogenous substrates such as poly(Glu, Tyr) 4:1 and casein. Therefore, the Ndr kinases in plants, which had been assumed as protein serine (Ser)/threonine (Thr) kinases, turned out to be dual-specificity kinases responsible for phosphorylation of Tyr residues and Ser/Thr residues in plant proteins.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

90

Unknown

The critical role of amino acid residue at position 117 of mouse UDP-glucuronosyltransfererase 1a6a and 1a6b in resveratrol glucuronidation (2012)

Uchihashi, S., Nishikawa, M., Sakaki, T., Ikushiro, S.-i.

Oxford University Press

In: Journal of Biochemistry

add to mindlist on the mindlist

Details

Publication Date: 2012-09-29

Description: Mouse UDP-glucuronosyltransferase 1a6 (Ugt1a6) contains two functional copies of 1a6a and 1a6b that share high sequence homology (98%). Only 10 amino acids located around the substrate recognition region are different out of 531 total residues. Although Ugt1a6 plays important roles in conjugating phenolic compounds, the functional characteristics of these isozymes are unclear. We performed functional analyses of mouse Ugt1a6a and Ugt1a6b using two isomeric polyphenols ( trans - and cis -resveratrol). The cDNAs of mouse Ugt1a6a and Ugt1a6b were cloned and constructed as recombinant proteins using a yeast expression system, and kinetic parameters were evaluated. The wild-type Ugt1a6a and Ugt1a6b proteins catalysed trans - and cis -resveratrol 3- O -glucuronidation. Although the K m value for trans -resveratrol was significantly lower for Ugt1a6a compared with Ugt1a6b, the K m values for cis -resveratrol were comparable for the isozymes. Despite high sequence homology, significant kinetic differences were observed between the isozymes. To identify the critical residues for resveratrol glucuronidation, we constructed 10 variants of Ugt1a6a (T81P, N96R, H98Q, L100V, S104P, N115S, I117L, V118T, V119L and D120E). The I117L variant had Ugt1a6b-like enzymatic properties of K m in trans -resveratrol, and V max and K si in cis -form, suggesting that the residues located at position 117 of Ugt1a6a and Ugt1a6b play an important role in resveratrol glucuronidation.

Print ISSN: 0021-924X

Electronic ISSN: 1756-2651

Topics: Biology , Chemistry and Pharmacology

Published by Oxford University Press on behalf of The Japanese Biochemical Society (JBS).

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

91

Unknown

MolBioLib: a C++11 framework for rapid development and deployment of bioinformatics tasks (2012)

Ohsumi, T. K., Borowsky, M. L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: : We developed MolBioLib to address the need for adaptable next-generation sequencing analysis tools. The result is a compact, portable and extensively tested C++11 software framework and set of applications tailored to the demands of next-generation sequencing data and applicable to many other applications. MolBioLib is designed to work with common file formats and data types used both in genomic analysis and general data analysis. A central relational-database-like Table class is a flexible and powerful object to intuitively represent and work with a wide variety of tabular datasets, ranging from alignment data to annotations. MolBioLib has been used to identify causative single-nucleotide polymorphisms in whole genome sequencing, detect balanced chromosomal rearrangements and compute enrichment of messenger RNAs (mRNAs) on microtubules, typically requiring applications of under 200 lines of code. MolBioLib includes programs to perform a wide variety of analysis tasks, such as computing read coverage, annotating genomic intervals and novel peak calling with a wavelet algorithm. Although MolBioLib was designed primarily for bioinformatics purposes, much of its functionality is applicable to a wide range of problems. Complete documentation and an extensive automated test suite are provided. Availability: MolBioLib is available for download at: http://sourceforge.net/projects/molbiolib Contact : ohsumit@molbio.mgh.harvard.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

92

Unknown

YAHA: fast and flexible long-read alignment with optimal breakpoint detection (2012)

Faust, G. G., Hall, I. M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: Motivation: With improved short-read assembly algorithms and the recent development of long-read sequencers, split mapping will soon be the preferred method for structural variant (SV) detection. Yet, current alignment tools are not well suited for this. Results: We present YAHA, a fast and flexible hash-based aligner. YAHA is as fast and accurate as BWA-SW at finding the single best alignment per query and is dramatically faster and more sensitive than both SSAHA2 and MegaBLAST at finding all possible alignments. Unlike other aligners that report all, or one, alignment per query, or that use simple heuristics to select alignments, YAHA uses a directed acyclic graph to find the optimal set of alignments that cover a query using a biologically relevant breakpoint penalty. YAHA can also report multiple mappings per defined segment of the query. We show that YAHA detects more breakpoints in less time than BWA-SW across all SV classes, and especially excels at complex SVs comprising multiple breakpoints. Availability: YAHA is currently supported on 64-bit Linux systems. Binaries and sample data are freely available for download from http://faculty.virginia.edu/irahall/YAHA . Contact: imh4y@virginia.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

93

Unknown

MetiTree: a web application to organize and process high-resolution multi-stage mass spectrometry metabolomics data (2012)

Rojas-Cherto, M., van Vliet, M., Peironcely, J. E., van Doorn, R., Kooyman, M., te Beek, T., van Driel, M. A., Hankemeier, T., Reijmers, T.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-10-11

Description: : Identification of metabolites using high-resolution multi-stage mass spectrometry (MS n ) data is a significant challenge demanding access to all sorts of computational infrastructures. MetiTree is a user-friendly, web application dedicated to organize, process, share, visualize and compare MS n data. It integrates several features to export and visualize complex MS n data, facilitating the exploration and interpretation of metabolomics experiments. A dedicated spectral tree viewer allows the simultaneous presentation of three related types of MS n data, namely, the spectral data, the fragmentation tree and the fragmentation reactions. MetiTree stores the data in an internal database to enable searching for similar fragmentation trees and matching against other MS n data. As such MetiTree contains much functionality that will make the difficult task of identifying unknown metabolites much easier. Availability: MetiTree is accessible at http://www.MetiTree.nl . The source code is available at https://github.com/NetherlandsMetabolomicsCentre/metitree/wiki . Contact: m.rojas@lacdr.leidenuniv.nl or t.reijmers@lacdr.leidenuniv.nl

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

94

Unknown

SimRare: a program to generate and analyze sequence-based data for association studies of quantitative and qualitative traits (2012)

Li, B., Wang, G., Leal, S. M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-10-11

Description: Motivation: Currently, there is great interest in detecting complex trait rare variant associations using next-generation sequence data. On a monthly basis, new rare variant association methods are published. It is difficult to evaluate these methods because there is no standard to generate data and often comparisons are biased. In order to fairly compare rare variant association methods, it is necessary to generate data using realistic population demographic and phenotypic models. Result: SimRare is an interactive program that integrates generation of rare variant genotype/phenotype data and evaluation of association methods using a unified platform. Variant data are generated for gene regions using forward-time simulation that incorporates realistic population demographic and evolutionary scenarios. Phenotype data can be obtained for both case–control and quantitative traits. SimRare has a user-friendly interface that allows for easy entry of genetic and phenotypic parameters. Novel rare variant association methods implemented in R can also be imported into SimRare, to evaluate their performance and compare results, e.g. power and Type I error, with other currently available methods both numerically and graphically. Availability: http://code.google.com/p/simrare/ Contact: sleal@bcm.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

95

Unknown

TMBB-DB: a transmembrane {beta}-barrel proteome database (2012)

Freeman, T. C., Wimley, W. C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: Motivation: We previously reported the development of a highly accurate statistical algorithm for identifying β-barrel outer membrane proteins or transmembrane β-barrels (TMBBs), from genomic sequence data of Gram-negative bacteria (Freeman,T.C. and Wimley,W.C. (2010) Bioinformatics , 26 , 1965–1974). We have now applied this identification algorithm to all available Gram-negative bacterial genomes (over 600 chromosomes) and have constructed a publicly available, searchable, up-to-date, database of all proteins in these genomes. Results: For each protein in the database, there is information on (i) β-barrel membrane protein probability for identification of β-barrels, (ii) β-strand and β-hairpin propensity for structure and topology prediction, (iii) signal sequence score because most TMBBs are secreted through the inner membrane translocon and, thus, have a signal sequence, and (iv) transmembrane α-helix predictions, for reducing false positive predictions. This information is sufficient for the accurate identification of most β-barrel membrane proteins in these genomes. In the database there are nearly 50 000 predicted TMBBs (out of 1.9 million total putative proteins). Of those, more than 15 000 are ‘hypothetical’ or ‘putative’ proteins, not previously identified as TMBBs. This wealth of genomic information is not available anywhere else. Availability: The TMBB genomic database is available at http://beta-barrel.tulane.edu/ . Contact: wwimley@tulane.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

96

Unknown

iBBiG: iterative binary bi-clustering of gene sets (2012)

Gusenleitner, D., Howe, E. A., Bentink, S., Quackenbush, J., Culhane, A. C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set—phenotype association that predicted tumor metastases within tumor subtypes. Availability: Implemented in the Bioconductor package iBBiG Contact: aedin@jimmy.harvard.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

97

Unknown

Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features (2012)

Bacardit, J., Widera, P., Marquez-Chamorro, A., Divina, F., Aguilar-Ruiz, J. S., Krasnogor, N.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: Motivation: The prediction of a protein’s contact map has become in recent years, a crucial stepping stone for the prediction of the complete 3D structure of a protein. In this article, we describe a methodology for this problem that was shown to be successful in CASP8 and CASP9. The methodology is based on (i) the fusion of the prediction of a variety of structural aspects of protein residues, (ii) an ensemble strategy used to facilitate the training process and (iii) a rule-based machine learning system from which we can extract human-readable explanations of the predictor and derive useful information about the contact map representation. Results: The main part of the evaluation is the comparison against the sequence-based contact prediction methods from CASP9, where our method presented the best rank in five out of the six evaluated metrics. We also assess the impact of the size of the ensemble used in our predictor to show the trade-off between performance and training time of our method. Finally, we also study the rule sets generated by our machine learning system. From this analysis, we are able to estimate the contribution of the attributes in our representation and how these interact to derive contact predictions. Availability: http://icos.cs.nott.ac.uk/servers/psp.html . Contact: natalio.krasnogor@nottingham.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

98

Unknown

Bacterial GRAS domain proteins throw new light on gibberellic acid response mechanisms (2012)

Zhang, D., Iyer, L. M., Aravind, L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: : Gibberellic acids (GAs) are key plant hormones, regulating various aspects of growth and development, which have been at the center of the ‘green revolution’. GRAS family proteins, the primary players in GA signaling pathways, remain poorly understood. Using sequence-profile searches, structural comparisons and phylogenetic analysis, we establish that the GRAS family first emerged in bacteria and belongs to the Rossmann fold methyltransferase superfamily. All bacterial and a subset of plant GRAS proteins are likely to function as small-molecule methylases. The remaining plant versions have lost one or more AdoMet (SAM)-binding residues while preserving their substrate-binding residues. We predict that GRAS proteins might either modify or bind small molecules such as GAs or their derivatives. Contact: aravind@ncbi.nlm.nih.gov Supplementary Information: Supplementary Material for this article is available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

99

Unknown

Nebula--a web-server for advanced ChIP-seq data analysis (2012)

Boeva, V., Lermine, A., Barette, C., Guillouf, C., Barillot, E.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: Motivation: ChIP-seq consists of chromatin immunoprecipitation and deep sequencing of the extracted DNA fragments. It is the technique of choice for accurate characterization of the binding sites of transcription factors and other DNA-associated proteins. We present a web service, Nebula, which allows inexperienced users to perform a complete bioinformatics analysis of ChIP-seq data. Results: Nebula was designed for both bioinformaticians and biologists. It is based on the Galaxy open source framework. Galaxy already includes a large number of functionalities for mapping reads and peak calling. We added the following to Galaxy: (i) peak calling with FindPeaks and a module for immunoprecipitation quality control, (ii) de novo motif discovery with ChIPMunk, (iii) calculation of the density and the cumulative distribution of peak locations relative to gene transcription start sites, (iv) annotation of peaks with genomic features and (v) annotation of genes with peak information. Nebula generates the graphs and the enrichment statistics at each step of the process. During Steps 3–5, Nebula optionally repeats the analysis on a control dataset and compares these results with those from the main dataset. Nebula can also incorporate gene expression (or gene modulation) data during these steps. In summary, Nebula is an innovative web service that provides an advanced ChIP-seq analysis pipeline providing ready-to-publish results. Availability: Nebula is available at http://nebula.curie.fr/ Supplementary Information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

100

Unknown

Identifying multi-layer gene regulatory modules from multi-dimensional genomic data (2012)

Li, W., Zhang, S., Liu, C.-C., Zhou, X. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2012-09-30

Description: Motivation: Eukaryotic gene expression (GE) is subjected to precisely coordinated multi-layer controls, across the levels of epigenetic, transcriptional and post-transcriptional regulations. Recently, the emerging multi-dimensional genomic dataset has provided unprecedented opportunities to study the cross-layer regulatory interplay. In these datasets, the same set of samples is profiled on several layers of genomic activities, e.g. copy number variation (CNV), DNA methylation (DM), GE and microRNA expression (ME). However, suitable analysis methods for such data are currently sparse. Results: In this article, we introduced a sparse Multi-Block Partial Least Squares (sMBPLS) regression method to identify multi-dimensional regulatory modules from this new type of data. A multi-dimensional regulatory module contains sets of regulatory factors from different layers that are likely to jointly contribute to a local ‘gene expression factory’. We demonstrated the performance of our method on the simulated data as well as on The Cancer Genomic Atlas Ovarian Cancer datasets including the CNV, DM, ME and GE data measured on 230 samples. We showed that majority of identified modules have significant functional and transcriptional enrichment, higher than that observed in modules identified using only a single type of genomic data. Our network analysis of the modules revealed that the CNV, DM and microRNA can have coupled impact on expression of important oncogenes and tumor suppressor genes. Availability and implementation: The source code implemented by MATLAB is freely available at: http://zhoulab.usc.edu/sMBPLS/ . Contact: xjzhou@usc.edu Supplementary information: Supplementary material are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext