ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

AVIA v2.0: annotation, visualization and impact analysis of genomic variants and genes (2015)

Vuong, H., Che, A., Ravichandran, S., Luke, B. T., Collins, J. R., Mudunuri, U. S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : As sequencing becomes cheaper and more widely available, there is a greater need to quickly and effectively analyze large-scale genomic data. While the functionality of AVIA v1.0, whose implementation was based on ANNOVAR, was comparable with other annotation web servers, AVIA v2.0 represents an enhanced web-based server that extends genomic annotations to cell-specific transcripts and protein-level functional annotations. With AVIA’s improved interface, users can better visualize their data, perform comprehensive searches and categorize both coding and non-coding variants. Availability and implementation : AVIA is freely available through the web at http://avia.abcc.ncifcrf.gov . Contact : Hue.Vuong@fnlcr.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies (2015)

Vuckovic, D., Gasparini, P., Soranzo, N., Iotchkova, V.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : As new methods for multivariate analysis of genome wide association studies become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance-based method for meta-analysis, generalized to an n -dimensional setting. Availability and implementation: The R package MultiMeta can be downloaded from CRAN. Contact: dragana.vuckovic@burlo.trieste.it ; vi1@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Gene selection for the reconstruction of stem cell differentiation trees: a linear programming approach (2015)

Ghadie, M. A., Japkowicz, N., Perkins, T. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Stem cell differentiation is largely guided by master transcriptional regulators, but it also depends on the expression of other types of genes, such as cell cycle genes, signaling genes, metabolic genes, trafficking genes, etc. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering can organize cell types into a tree, but in general this tree is different from the differentiation hierarchy itself. Methods: Given the differentiation hierarchy and gene expression data at each node, we construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming approach to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. Results: We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a weighted Euclidean metric that uses just 175 genes. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. We then report on the selected genes and their biological functions. Our approach offers a new way to identify genes that may have important roles in stem cell differentiation. Contact: tperkins@ohri.ca Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data (2015)

Sill, M., Saadati, M., Benner, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Principal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of features contribute to a principal component (PC), this estimation consistency can be retained. Most existing sparse PCA methods use L1-penalization, i.e. the lasso , to perform feature selection. But, the lasso is known to lack variable selection consistency in high dimensions and therefore a subsequent interpretation of selected features can give misleading results. Results: We present S4VDPCA, a sparse PCA method that incorporates a subsampling approach, namely stability selection. S4VDPCA can consistently select the truly relevant variables contributing to a sparse PC while also consistently estimate the direction of maximal variability. The performance of the S4VDPCA is assessed in a simulation study and compared to other PCA approaches, as well as to a hypothetical oracle PCA that ‘knows’ the truly relevant features in advance and thus finds optimal, unbiased sparse PCs. S4VDPCA is computationally efficient and performs best in simulations regarding parameter estimation consistency and feature selection consistency. Furthermore, S4VDPCA is applied to a publicly available gene expression data set of medulloblastoma brain tumors. Features contributing to the first two estimated sparse PCs represent genes significantly over-represented in pathways typically deregulated between molecular subgroups of medulloblastoma. Availability and implementation: Software is available at https://github.com/mwsill/s4vdpca . Contact: m.sill@dkfz.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

GS-align for glycan structure alignment and similarity measurement (2015)

Lee, H. S., Jo, S., Mukherjee, S., Park, S.-J., Skolnick, J., Lee, J., Im, W.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Glycans play critical roles in many biological processes, and their structural diversity is key for specific protein-glycan recognition. Comparative structural studies of biological molecules provide useful insight into their biological relationships. However, most computational tools are designed for protein structure, and despite their importance, there is no currently available tool for comparing glycan structures in a sequence order- and size-independent manner. Results: A novel method, GS-align, is developed for glycan structure alignment and similarity measurement. GS-align generates possible alignments between two glycan structures through iterative maximum clique search and fragment superposition. The optimal alignment is then determined by the maximum structural similarity score, GS-score, which is size-independent. Benchmark tests against the Protein Data Bank (PDB) N -linked glycan library and PDB homologous/non-homologous N -glycoprotein sets indicate that GS-align is a robust computational tool to align glycan structures and quantify their structural similarity. GS-align is also applied to template-based glycan structure prediction and monosaccharide substitution matrix generation to illustrate its utility. Availability and implementation: http://www.glycanstructure.org/gsalign . Contact: wonpil@ku.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Analysis of impedance-based cellular growth assays (2015)

Witzel, F., Fritsche-Guenther, R., Lehmann, N., Sieber, A., Bluthgen, N.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Impedance-based technologies are advancing methods for measuring proliferation of adherent cell cultures non-invasively and in real time. The analysis of the resulting data has so far been hampered by inappropriate computational methods and the lack of systematic data to evaluate the characteristics of the assay. Results: We used a commercially available system for impedance-based growth measurement (xCELLigence) and compared the reported cell index with data from microscopy. We found that the measured signal correlates linearly with the cell number throughout the time of an experiment with sufficient accuracy in subconfluent cell cultures. The resulting growth curves for various colon cancer cells could be well described with the empirical Richards growth model, which allows for extracting quantitative parameters (such as characteristic cycle times). We found that frequently used readouts like the cell index at a specific time or the area under the growth curve cannot be used to faithfully characterize growth inhibition. We propose to calculate the average growth rate of selected time intervals to accurately estimate time-dependent IC50 values of drugs from growth curves. Contact: nils.bluethgen@charite.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Data-dependent bucketing improves reference-free compression of sequencing reads (2015)

Patro, R., Kingsford, C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: The storage and transmission of high-throughput sequencing data consumes significant resources. As our capacity to produce such data continues to increase, this burden will only grow. One approach to reduce storage and transmission requirements is to compress this sequencing data. Results: We present a novel technique to boost the compression of sequencing that is based on the concept of bucketing similar reads so that they appear nearby in the file. We demonstrate that, by adopting a data-dependent bucketing scheme and employing a number of encoding ideas, we can achieve substantially better compression ratios than existing de novo sequence compression tools, including other bucketing and reordering schemes. Our method, Mince, achieves up to a 45% reduction in file sizes (28% on average) compared with existing state-of-the-art de novo compression schemes. Availability and implementation : Mince is written in C++11, is open source and has been made available under the GPLv3 license. It is available at http://www.cs.cmu.edu/~ckingsf/software/mince . Contact: carlk@cs.cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Identification of C2H2-ZF binding preferences from ChIP-seq data using RCADE (2015)

Najafabadi, H. S., Albu, M., Hughes, T. R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : Current methods for motif discovery from chromatin immunoprecipitation followed by sequencing (ChIP-seq) data often identify non-targeted transcription factor (TF) motifs, and are even further limited when peak sequences are similar due to common ancestry rather than common binding factors. The latter aspect particularly affects a large number of proteins from the Cys 2 His 2 zinc finger (C2H2-ZF) class of TFs, as their binding sites are often dominated by endogenous retroelements that have highly similar sequences. Here, we present recognition code-assisted discovery of regulatory elements (RCADE) for motif discovery from C2H2-ZF ChIP-seq data. RCADE combines predictions from a DNA recognition code of C2H2-ZFs with ChIP-seq data to identify models that represent the genuine DNA binding preferences of C2H2-ZF proteins. We show that RCADE is able to identify generalizable binding models even from peaks that are exclusively located within the repeat regions of the genome, where state-of-the-art motif finding approaches largely fail. Availability and implementation: RCADE is available as a webserver and also for download at http://rcade.ccbr.utoronto.ca/ . Supplementary information: Supplementary data are available at Bioinformatics online. Contact: t.hughes@utoronto.ca

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Phylesystem: a git-based data store for community-curated phylogenetic estimates (2015)

Mc; Tavish, E. J., Hinchliff, C. E., Allman, J. F., Brown, J. W., Cranston, K. A., Holder, M. T., Rees, J. A., Smith, S. A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al. , 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. Results : Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git’s version control features. Hosting this data store on GitHub ( http://github.com/ ) provides open access to the data store using tools familiar to many developers. We have deployed a server running the ‘phylesystem-api’, which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. Availability and implementation : Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api . The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem . A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator . Code for that tool is available from https://github.com/OpenTreeOfLife/opentree . Contact : mtholder@gmail.com

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Using combined evidence from replicates to evaluate ChIP-seq peaks (2015)

Jalili, V., Matteucci, M., Masseroli, M., Morelli, M. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) detects genome-wide DNA–protein interactions and chromatin modifications, returning enriched regions (ERs), usually associated with a significance score. Moderately significant interactions can correspond to true, weak interactions, or to false positives; replicates of a ChIP-seq experiment can provide co-localised evidence to decide between the two cases. We designed a general methodological framework to rigorously combine the evidence of ERs in ChIP-seq replicates, with the option to set a significance threshold on the repeated evidence and a minimum number of samples bearing this evidence. Results : We applied our method to Myc transcription factor ChIP-seq datasets in K562 cells available in the ENCODE project. Using replicates, we could extend up to 3 times the ER number with respect to single-sample analysis with equivalent significance threshold. We validated the ‘rescued’ ERs by checking for the overlap with open chromatin regions and for the enrichment of the motif that Myc binds with strongest affinity; we compared our results with alternative methods (IDR and jMOSAiCS), obtaining more validated peaks than the former and less peaks than latter, but with a better validation. Availability and implementation : An implementation of the proposed method and its source code under GPLv3 license are freely available at http://www.bioinformatics.deib.polimi.it/MSPC/ and http://mspc.codeplex.com/ , respectively. Contact : marco.morelli@iit.it Supplementary information: Supplementary Material are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

11

Unknown

kSNP3.0: SNP detection and phylogenetic analysis of genomes without genome alignment or reference genome (2015)

Gardner, S. N., Slezak, T., Hall, B. G.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : We announce the release of kSNP3.0, a program for SNP identification and phylogenetic analysis without genome alignment or the requirement for reference genomes. kSNP3.0 is a significantly improved version of kSNP v2. Availability and implementation : kSNP3.0 is implemented as a package of stand-alone executables for Linux and Mac OS X under the open-source BSD license. The executable packages, source code and a full User Guide are freely available at https://sourceforge.net/projects/ksnp/files/ Contact: barryghall@gmail.com

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

12

Unknown

phylogeo: an R package for geographic analysis and visualization of microbiome data (2015)

Charlop-Powers, Z., Brady, S. F.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: We have created an R package named phylogeo that provides a set of geographic utilities for sequencing-based microbial ecology studies. Although the geographic location of samples is an important aspect of environmental microbiology, none of the major software packages used in processing microbiome data include utilities that allow users to map and explore the spatial dimension of their data. phylogeo solves this problem by providing a set of plotting and mapping functions that can be used to visualize the geographic distribution of samples, to look at the relatedness of microbiomes using ecological distance, and to map the geographic distribution of particular sequences. By extending the popular phyloseq package and using the same data structures and command formats, phylogeo allows users to easily map and explore the geographic dimensions of their data from the R programming language. Availability and Implementation: phylogeo is documented and freely available http://zachcp.github.io/phylogeo Contact : zcharlop@rockefeller.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

13

Unknown

Gener: a minimal programming module for chemical controllers based on DNA strand displacement (2015)

Kahramanogulları, O., Cardelli, L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : Gener is a development module for programming chemical controllers based on DNA strand displacement. Gener is developed with the aim of providing a simple interface that minimizes the opportunities for programming errors: Gener allows the user to test the computations of the DNA programs based on a simple two-domain strand displacement algebra, the minimal available so far. The tool allows the user to perform stepwise computations with respect to the rules of the algebra as well as exhaustive search of the computation space with different options for exploration and visualization. Gener can be used in combination with existing tools, and in particular, its programs can be exported to Microsoft Research’s DSD tool as well as to LaTeX. Availability and implementation : Gener is available for download at the Cosbi website at http://www.cosbi.eu/research/prototypes/gener as a windows executable that can be run on Mac OS X and Linux by using Mono. Contact : ozan@cosbi.eu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

14

Unknown

MemGen: a general web server for the setup of lipid membrane simulation systems (2015)

Knight, C. J., Hub, J. S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Molecular dynamics simulations provide atomic insight into the physicochemical characteristics of lipid membranes and hence, a wide range of force field families capable of modelling various lipid types have been developed in recent years. To model membranes in a biologically realistic lipid composition, simulation systems containing multiple different lipids must be assembled. Results: We present a new web service called MemGen that is capable of setting up simulation systems of heterogenous lipid membranes. MemGen is not restricted to certain lipid force fields or lipid types, but instead builds membranes from uploaded structure files which may contain any kind of amphiphilic molecule. MemGen works with any all-atom or united-atom lipid representation. Availability and implementation: MemGen is freely available without registration at http://memgen.uni-goettingen.de . Contact: jhub@gwdg.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

15

Unknown

Inferring data-specific micro-RNA function through the joint ranking of micro-RNA and pathways from matched micro-RNA and gene expression data (2015)

Patrick, E., Buckley, M., Muller, S., Lin, D. M., Yang, J. Y. H.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: In practice, identifying and interpreting the functional impacts of the regulatory relationships between micro-RNA and messenger-RNA is non-trivial. The sheer scale of possible micro-RNA and messenger-RNA interactions can make the interpretation of results difficult. Results: We propose a supervised framework, pMim, built upon concepts of significance combination, for jointly ranking regulatory micro-RNA and their potential functional impacts with respect to a condition of interest. Here, pMim directly tests if a micro-RNA is differentially expressed and if its predicted targets, which lie in a common biological pathway, have changed in the opposite direction. We leverage the information within existing micro-RNA target and pathway databases to stabilize the estimation and annotation of micro-RNA regulation making our approach suitable for datasets with small sample sizes. In addition to outputting meaningful and interpretable results, we demonstrate in a variety of datasets that the micro-RNA identified by pMim, in comparison to simpler existing approaches, are also more concordant with what is described in the literature. Availability and implementation: This framework is implemented as an R function, pMim , in the package sydSeq available from http://www.ellispatrick.com/r-packages . Contact: jean.yang@sydney.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

16

Unknown

INSPEcT: a computational tool to infer mRNA synthesis, processing and degradation dynamics from RNA- and 4sU-seq time course experiments (2015)

de Pretis, S., Kress, T., Morelli, M. J., Melloni, G. E. M., Riva, L., Amati, B., Pelizzola, M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Cellular mRNA levels originate from the combined action of multiple regulatory processes, which can be recapitulated by the rates of pre-mRNA synthesis, pre-mRNA processing and mRNA degradation. Recent experimental and computational advances set the basis to study these intertwined levels of regulation. Nevertheless, software for the comprehensive quantification of RNA dynamics is still lacking. Results: INSPEcT is an R package for the integrative analysis of RNA- and 4sU-seq data to study the dynamics of transcriptional regulation. INSPEcT provides gene-level quantification of these rates, and a modeling framework to identify which of these regulatory processes are most likely to explain the observed mRNA and pre-mRNA concentrations. Software performance is tested on a synthetic dataset, instrumental to guide the choice of the modeling parameters and the experimental design. Availability and implementation: INSPEcT is submitted to Bioconductor and is currently available as Supplementary Additional File S1 . Contact: mattia.pelizzola@iit.it Supplementary Information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

17

Unknown

Addressing false discoveries in network inference (2015)

Petri, T., Altmann, S., Geistlinger, L., Zimmer, R., Kuffner, R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Experimentally determined gene regulatory networks can be enriched by computational inference from high-throughput expression profiles. However, the prediction of regulatory interactions is severely impaired by indirect and spurious effects, particularly for eukaryotes. Recently, published methods report improved predictions by exploiting the a priori known targets of a regulator (its local topology) in addition to expression profiles. Results: We find that methods exploiting known targets show an unexpectedly high rate of false discoveries. This leads to inflated performance estimates and the prediction of an excessive number of new interactions for regulators with many known targets. These issues are hidden from common evaluation and cross-validation setups, which is due to Simpson’s paradox. We suggest a confidence score recalibration method (CoRe) that reduces the false discovery rate and enables a reliable performance estimation. Conclusions: CoRe considerably improves the results of network inference methods that exploit known targets. Predictions then display the biological process specificity of regulators more correctly and enable the inference of accurate genome-wide regulatory networks in eukaryotes. For yeast, we propose a network with more than 22 000 confident interactions. We point out that machine learning approaches outside of the area of network inference may be affected as well. Availability and implementation: Results, executable code and networks are available via our website http://www.bio.ifi.lmu.de/forschung/CoRe . Contact: robert.kueffner@helmholtz-muenchen.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

18

Unknown

Genome-scale strain designs based on regulatory minimal cut sets (2015)

Mahadevan, R., von Kamp, A., Klamt, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Stoichiometric and constraint-based methods of computational strain design have become an important tool for rational metabolic engineering. One of those relies on the concept of constrained minimal cut sets (cMCSs). However, as most other techniques, cMCSs may consider only reaction (or gene) knockouts to achieve a desired phenotype. Results : We generalize the cMCSs approach to constrained regulatory MCSs (cRegMCSs), where up/downregulation of reaction rates can be combined along with reaction deletions. We show that flux up/downregulations can virtually be treated as cuts allowing their direct integration into the algorithmic framework of cMCSs. Because of vastly enlarged search spaces in genome-scale networks, we developed strategies to (optionally) preselect suitable candidates for flux regulation and novel algorithmic techniques to further enhance efficiency and speed of cMCSs calculation. We illustrate the cRegMCSs approach by a simple example network and apply it then by identifying strain designs for ethanol production in a genome-scale metabolic model of Escherichia coli. The results clearly show that cRegMCSs combining reaction deletions and flux regulations provide a much larger number of suitable strain designs, many of which are significantly smaller relative to cMCSs involving only knockouts. Furthermore, with cRegMCSs, one may also enable the fine tuning of desired behaviours in a narrower range. The new cRegMCSs approach may thus accelerate the implementation of model-based strain designs for the bio-based production of fuels and chemicals. Availability and implementation: MATLAB code and the examples can be downloaded at http://www.mpi-magdeburg.mpg.de/projects/cna/etcdownloads.html . Contact : krishna.mahadevan@utoronto.ca or klamt@mpi-magdeburg.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

19

Unknown

The SwissLipids knowledgebase for lipid biology (2015)

Aimo, L., Liechti, R., Hyka-Nouspikel, N., Niknejad, A., Gleizes, A., Gotz, L., Kuznetsov, D., David, F. P. A., van der Goot, F. G., Riezman, H., Bougueleret, L., Xenarios, I., Bridge, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Lipids are a large and diverse group of biological molecules with roles in membrane formation, energy storage and signaling. Cellular lipidomes may contain tens of thousands of structures, a staggering degree of complexity whose significance is not yet fully understood. High-throughput mass spectrometry-based platforms provide a means to study this complexity, but the interpretation of lipidomic data and its integration with prior knowledge of lipid biology suffers from a lack of appropriate tools to manage the data and extract knowledge from it. Results: To facilitate the description and exploration of lipidomic data and its integration with prior biological knowledge, we have developed a knowledge resource for lipids and their biology—SwissLipids. SwissLipids provides curated knowledge of lipid structures and metabolism which is used to generate an in silico library of feasible lipid structures. These are arranged in a hierarchical classification that links mass spectrometry analytical outputs to all possible lipid structures, metabolic reactions and enzymes. SwissLipids provides a reference namespace for lipidomic data publication, data exploration and hypothesis generation. The current version of SwissLipids includes over 244 000 known and theoretically possible lipid structures, over 800 proteins, and curated links to published knowledge from over 620 peer-reviewed publications. We are continually updating the SwissLipids hierarchy with new lipid categories and new expert curated knowledge. Availability: SwissLipids is freely available at http://www.swisslipids.org/ . Contact: alan.bridge@isb-sib.ch Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

20

Unknown

chipPCR: an R package to pre-process raw data of amplification curves (2015)

Rodiger, S., Burdukiewicz, M., Schierack, P.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Both the quantitative real-time polymerase chain reaction (qPCR) and quantitative isothermal amplification (qIA) are standard methods for nucleic acid quantification. Numerous real-time read-out technologies have been developed. Despite the continuous interest in amplification-based techniques, there are only few tools for pre-processing of amplification data. However, a transparent tool for precise control of raw data is indispensable in several scenarios, for example, during the development of new instruments. Results: chipPCR is an R package for the pre-processing and quality analysis of raw data of amplification curves. The package takes advantage of R ’s S 4 object model and offers an extensible environment. chipPCR contains tools for raw data exploration: normalization, baselining, imputation of missing values, a powerful wrapper for amplification curve smoothing and a function to detect the start and end of an amplification curve. The capabilities of the software are enhanced by the implementation of algorithms unavailable in R , such as a 5-point stencil for derivative interpolation. Simulation tools, statistical tests, plots for data quality management, amplification efficiency/quantification cycle calculation, and datasets from qPCR and qIA experiments are part of the package. Core functionalities are integrated in GUIs (web-based and standalone shiny applications), thus streamlining analysis and report generation. Availability and implementation: http://cran.r-project.org/web/packages/chipPCR . Source code: https://github.com/michbur/chipPCR . Contact : stefan.roediger@b-tu.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

21

Unknown

iFoldRNA v2: folding RNA with constraints (2015)

Krokhotin, A., Houlihan, K., Dokholyan, N. V.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : A key to understanding RNA function is to uncover its complex 3D structure. Experimental methods used for determining RNA 3D structures are technologically challenging and laborious, which makes the development of computational prediction methods of substantial interest. Previously, we developed the iFoldRNA server that allows accurate prediction of short (〈50 nt) tertiary RNA structures starting from primary sequences. Here, we present a new version of the iFoldRNA server that permits the prediction of tertiary structure of RNAs as long as a few hundred nucleotides. This substantial increase in the server capacity is achieved by utilization of experimental information such as base-pairing and hydroxyl-radical probing. We demonstrate a significant benefit provided by integration of experimental data and computational methods. Availability and implementation: http://ifoldrna.dokhlab.org Contact: dokh@unc.eu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

22

Unknown

ms-data-core-api: an open-source, metadata-oriented library for computational proteomics (2015)

Perez-Riverol, Y., Uszkoreit, J., Sanchez, A., Ternent, T., del Toro, N., Hermjakob, H., Vizcaino, J. A., Wang, R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : The ms-data-core-api is a free, open-source library for developing computational proteomics tools and pipelines. The Application Programming Interface, written in Java, enables rapid tool creation by providing a robust, pluggable programming interface and common data model. The data model is based on controlled vocabularies/ontologies and captures the whole range of data types included in common proteomics experimental workflows, going from spectra to peptide/protein identifications to quantitative results. The library contains readers for three of the most used Proteomics Standards Initiative standard file formats: mzML, mzIdentML, and mzTab. In addition to mzML, it also supports other common mass spectra data formats: dta, ms2, mgf, pkl, apl (text-based), mzXML and mzData (XML-based). Also, it can be used to read PRIDE XML, the original format used by the PRIDE database, one of the world-leading proteomics resources. Finally, we present a set of algorithms and tools whose implementation illustrates the simplicity of developing applications using the library. Availability and implementation: The software is freely available at https://github.com/PRIDE-Utilities/ms-data-core-api . Supplementary information: Supplementary data are available at Bioinformatics online Contact: juan@ebi.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

23

Unknown

'Flatten plus': a recent implementation in WSxM for biological research (2015)

Gimeno, A., Ares, P., Horcas, I., Gil, A., Gomez-Rodriguez, J. M., Colchero, J., Gomez-Herrero, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : Scanning probe microscopy (SPM) is already a relevant tool in biological research at the nanoscale. We present ‘Flatten plus’, a recent and helpful implementation in the well-known WSxM free software package. ‘Flatten plus’ allows reducing low-frequency noise in SPM images in a semi-automated way preventing the appearance of typical artifacts associated with such filters. Availability and implementation: WSxM is a free software implemented in C++ supported on MS Windows, but it can also be run under Mac or Linux using emulators such as Wine or Parallels. WSxM can be downloaded from http://www.wsxmsolutions.com/ . Contact: ignacio.horcas@wsxmsolutions.com

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

24

Unknown

GOplot: an R package for visually combining expression data with functional analysis (2015)

Walter, W., Sanchez-Cabo, F., Ricote, M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : Despite the plethora of methods available for the functional analysis of omics data, obtaining comprehensive-yet detailed understanding of the results remains challenging. This is mainly due to the lack of publicly available tools for the visualization of this type of information. Here we present an R package called GOplot, based on ggplot2, for enhanced graphical representation. Our package takes the output of any general enrichment analysis and generates plots at different levels of detail: from a general overview to identify the most enriched categories (bar plot, bubble plot) to a more detailed view displaying different types of information for molecules in a given set of categories (circle plot, chord plot, cluster plot). The package provides a deeper insight into omics data and allows scientists to generate insightful plots with only a few lines of code to easily communicate the findings. Availability and Implementation: The R package GOplot is available via CRAN-The Comprehensive R Archive Network: http://cran.r-project.org/web/packages/GOplot . The shiny web application of the Venn diagram can be found at: https://wwalter.shinyapps.io/Venn/ . A detailed manual of the package with sample figures can be found at https://wencke.github.io/ Contact: fscabo@cnic.es or mricote@cnic.es

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

25

Unknown

HTT-DB: Horizontally transferred transposable elements database (2015)

Dotto, B. R., Carvalho, E. L., Silva, A. F., Duarte Silva, L. F., Pinto, P. M., Ortiz, M. F., Wallau, G. L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Horizontal transfer of transposable (HTT) elements among eukaryotes was discovered in the mid-1980s. As then, 〉300 new cases have been described. New findings about HTT are revealing the evolutionary impact of this phenomenon on host genomes. In order to provide an up to date, interactive and expandable database for such events, we developed the HTT-DB database. Results: HTT-DB allows easy access to most of HTT cases reported along with rich information about each case. Moreover, it allows the user to generate tables and graphs based on searches using Transposable elements and/or host species classification and export them in several formats. Availability and implementation: This database is freely available on the web at http://lpa.saogabriel.unipampa.edu.br:8080/httdatabase . HTT-DB was developed based on Java and MySQL with all major browsers supported. Tools and software packages used are free for personal or non-profit projects. Contact: bdotto82@gmail.com or gabriel.wallau@gmail.com

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

26

Unknown

GREGOR: evaluating global enrichment of trait-associated variants in epigenomic features using a systematic, data-driven approach (2015)

Schmidt, E. M., Zhang, J., Zhou, W., Chen, J., Mohlke, K. L., Chen, Y. E., Willer, C. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation : The majority of variation identified by genome wide association studies falls in non-coding genomic regions and is hypothesized to impact regulatory elements that modulate gene expression. Here we present a statistically rigorous software tool GREGOR (Genomic Regulatory Elements and Gwas Overlap algoRithm) for evaluating enrichment of any set of genetic variants with any set of regulatory features. Using variants from five phenotypes, we describe a data-driven approach to determine the tissue and cell types most relevant to a trait of interest and to identify the subset of regulatory features likely impacted by these variants. Last, we experimentally evaluate six predicted functional variants at six lipid-associated loci and demonstrate significant evidence for allele-specific impact on expression levels. GREGOR systematically evaluates enrichment of genetic variation with the vast collection of regulatory data available to explore novel biological mechanisms of disease and guide us toward the functional variant at trait-associated loci. Availability and implementation : GREGOR, including source code, documentation, examples, and executables, is available at http://genome.sph.umich.edu/wiki/GREGOR . Contact : cristen@umich.edu Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

27

Unknown

Integration of somatic mutation, expression and functional data reveals potential driver genes predictive of breast cancer survival (2015)

Suo, C., Hrydziuszko, O., Lee, D., Pramana, S., Saputra, D., Joshi, H., Calza, S., Pawitan, Y.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Genome and transcriptome analyses can be used to explore cancers comprehensively, and it is increasingly common to have multiple omics data measured from each individual. Furthermore, there are rich functional data such as predicted impact of mutations on protein coding and gene/protein networks. However, integration of the complex information across the different omics and functional data is still challenging. Clinical validation, particularly based on patient outcomes such as survival, is important for assessing the relevance of the integrated information and for comparing different procedures. Results: An analysis pipeline is built for integrating genomic and transcriptomic alterations from whole-exome and RNA sequence data and functional data from protein function prediction and gene interaction networks. The method accumulates evidence for the functional implications of mutated potential driver genes found within and across patients. A driver-gene score (DGscore) is developed to capture the cumulative effect of such genes. To contribute to the score, a gene has to be frequently mutated, with high or moderate mutational impact at protein level, exhibiting an extreme expression and functionally linked to many differentially expressed neighbors in the functional gene network. The pipeline is applied to 60 matched tumor and normal samples of the same patient from The Cancer Genome Atlas breast-cancer project. In clinical validation, patients with high DGscores have worse survival than those with low scores ( P = 0.001). Furthermore, the DGscore outperforms the established expression-based signatures MammaPrint and PAM50 in predicting patient survival. In conclusion, integration of mutation, expression and functional data allows identification of clinically relevant potential driver genes in cancer. Availability and implementation: The documented pipeline including annotated sample scripts can be found in http://fafner.meb.ki.se/biostatwiki/driver-genes/ . Contact: yudi.pawitan@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

28

Unknown

EBSeq-HMM: a Bayesian approach for identifying gene-expression changes in ordered RNA-seq experiments (2015)

Leng, N., Li, Y., McIntosh, B. E., Nguyen, B. K., Duffin, B., Tian, S., Thomson, J. A., Dewey, C. N., Stewart, R., Kendziorski, C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: With improvements in next-generation sequencing technologies and reductions in price, ordered RNA-seq experiments are becoming common. Of primary interest in these experiments is identifying genes that are changing over time or space, for example, and then characterizing the specific expression changes. A number of robust statistical methods are available to identify genes showing differential expression among multiple conditions, but most assume conditions are exchangeable and thereby sacrifice power and precision when applied to ordered data. Results: We propose an empirical Bayes mixture modeling approach called EBSeq-HMM. In EBSeq-HMM, an auto-regressive hidden Markov model is implemented to accommodate dependence in gene expression across ordered conditions. As demonstrated in simulation and case studies, the output proves useful in identifying differentially expressed genes and in specifying gene-specific expression paths. EBSeq-HMM may also be used for inference regarding isoform expression. Availability and implementation: An R package containing examples and sample datasets is available at Bioconductor. Contact: kendzior@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

29

Unknown

FastMotif: spectral sequence motif discovery (2015)

Colombo, N., Vlassis, N.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, most of the existing motif finding algorithms are computationally demanding, and they may not be able to support the increasingly large datasets produced by modern high-throughput sequencing technologies. Results: We present FastMotif, a new motif discovery algorithm that is built on a recent machine learning technique referred to as Method of Moments. Based on spectral decompositions, our method is robust to model misspecifications and is not prone to locally optimal solutions. We obtain an algorithm that is extremely fast and designed for the analysis of big sequencing data. On HT-Selex data, FastMotif extracts motif profiles that match those computed by various state-of-the-art algorithms, but one order of magnitude faster. We provide a theoretical and numerical analysis of the algorithm’s robustness and discuss its sensitivity with respect to the free parameters. Availability and implementation: The Matlab code of FastMotif is available from http://lcsb-portal.uni.lu/bioinformatics . Contact: vlassis@adobe.com Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

30

Unknown

ScaffMatch: scaffolding algorithm based on maximum weight matching (2015)

Mandric, I., Zelikovsky, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Next-generation high-throughput sequencing has become a state-of-the-art technique in genome assembly. Scaffolding is one of the main stages of the assembly pipeline. During this stage, contigs assembled from the paired-end reads are merged into bigger chains called scaffolds. Because of a high level of statistical noise, chimeric reads, and genome repeats the problem of scaffolding is a challenging task. Current scaffolding software packages widely vary in their quality and are highly dependent on the read data quality and genome complexity. There are no clear winners and multiple opportunities for further improvements of the tools still exist. Results: This article presents an efficient scaffolding algorithm ScaffMatch that is able to handle reads with both short (〈600 bp) and long (〉35 000 bp) insert sizes producing high-quality scaffolds. We evaluate our scaffolding tool with the F score and other metrics (N50, corrected N50) on eight datasets comparing it with the most available packages. Our experiments show that ScaffMatch is the tool of preference for the most datasets. Availability and implementation: The source code is available at http://alan.cs.gsu.edu/NGS/?q=content/scaffmatch . Contact: mandric@cs.gsu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

31

Unknown

MultiP-SChlo: multi-label protein subchloroplast localization prediction with Chou's pseudo amino acid composition and a novel multi-label classifier (2015)

Wang, X., Zhang, W., Zhang, Q., Li, G.-Z.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Identifying protein subchloroplast localization in chloroplast organelle is very helpful for understanding the function of chloroplast proteins. There have existed a few computational prediction methods for protein subchloroplast localization. However, these existing works have ignored proteins with multiple subchloroplast locations when constructing prediction models, so that they can predict only one of all subchloroplast locations of this kind of multilabel proteins. Results: To address this problem, through utilizing label-specific features and label correlations simultaneously, a novel multilabel classifier was developed for predicting protein subchloroplast location(s) with both single and multiple location sites. As an initial study, the overall accuracy of our proposed algorithm reaches 55.52%, which is quite high to be able to become a promising tool for further studies. Availability and implementation: An online web server for our proposed algorithm named MultiP-SChlo was developed, which are freely accessible at http://biomed.zzuli.edu.cn/bioinfo/multip-schlo/ . Contact: pandaxiaoxi@gmail.com or gzli@tongji.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

32

Unknown

Conformational sampling and structure prediction of multiple interacting loops in soluble and {beta}-barrel membrane proteins using multi-loop distance-guided chain-growth Monte Carlo method (2015)

Tang, K., Wong, S. W. K., Liu, J. S., Zhang, J., Liang, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Loops in proteins are often involved in biochemical functions. Their irregularity and flexibility make experimental structure determination and computational modeling challenging. Most current loop modeling methods focus on modeling single loops. In protein structure prediction, multiple loops often need to be modeled simultaneously. As interactions among loops in spatial proximity can be rather complex, sampling the conformations of multiple interacting loops is a challenging task. Results: In this study, we report a new method called m ulti-loop Di stance-guided S equential chain- Gro wth Monte Carlo ( M -D i SG ro ) for prediction of the conformations of multiple interacting loops in proteins. Our method achieves an average RMSD of 1.93 Å for lowest energy conformations of 36 pairs of interacting protein loops with the total length ranging from 12 to 24 residues. We further constructed a data set containing proteins with 2, 3 and 4 interacting loops. For the most challenging target proteins with four loops, the average RMSD of the lowest energy conformations is 2.35 Å. Our method is also tested for predicting multiple loops in β-barrel membrane proteins. For outer-membrane protein G, the lowest energy conformation has a RMSD of 2.62 Å for the three extracellular interacting loops with a total length of 34 residues (12, 12 and 10 residues in each loop). Availability and implementation : The software is freely available at: tanto.bioe.uic.edu/m-DiSGro. Contact: jinfeng@stat.fsu.edu or jliang@uic.edu Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

33

Unknown

Proper evaluation of alignment-free network comparison methods (2015)

Yavero F;lu, O. N., Milenković, T., Pr E;ulj, N.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Network comparison is a computationally intractable problem with important applications in systems biology and other domains. A key challenge is to properly quantify similarity between wiring patterns of two networks in an alignment-free fashion. Also, alignment-based methods exist that aim to identify an actual node mapping between networks and as such serve a different purpose. Various alignment-free methods that use different global network properties (e.g. degree distribution) have been proposed. Methods based on small local subgraphs called graphlets perform the best in the alignment-free network comparison task, due to high level of topological detail that graphlets can capture. Among different graphlet-based methods, Graphlet Correlation Distance (GCD) was shown to be the most accurate for comparing networks. Recently, a new graphlet-based method called NetDis was proposed, which was claimed to be superior. We argue against this, as the performance of NetDis was not properly evaluated to position it correctly among the other alignment-free methods. Results : We evaluate the performance of available alignment-free network comparison methods, including GCD and NetDis. We do this by measuring accuracy of each method (in a systematic precision-recall framework) in terms of how well the method can group (cluster) topologically similar networks. By testing this on both synthetic and real-world networks from different domains, we show that GCD remains the most accurate, noise-tolerant and computationally efficient alignment-free method. That is, we show that NetDis does not outperform the other methods, as originally claimed, while it is also computationally more expensive. Furthermore, since NetDis is dependent on the choice of a network null model (unlike the other graphlet-based methods), we show that its performance is highly sensitive to the choice of this parameter. Finally, we find that its performance is not independent on network sizes and densities, as originally claimed. Contact : natasha@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

34

Unknown

Protein homology reveals new targets for bioactive small molecules (2015)

Gfeller, D., Zoete, V.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: The functional impact of small molecules is increasingly being assessed in different eukaryotic species through large-scale phenotypic screening initiatives. Identifying the targets of these molecules is crucial to mechanistically understand their function and uncover new therapeutically relevant modes of action. However, despite extensive work carried out in model organisms and human, it is still unclear to what extent one can use information obtained in one species to make predictions in other species. Results: Here, for the first time, we explore and validate at a large scale the use of protein homology relationships to predict the targets of small molecules across different species. Our results show that exploiting target homology can significantly improve the predictions, especially for molecules experimentally tested in other species. Interestingly, when considering separately orthology and paralogy relationships, we observe that mapping small molecule interactions among orthologs improves prediction accuracy, while including paralogs does not improve and even sometimes worsens the prediction accuracy. Overall, our results provide a novel approach to integrate chemical screening results across multiple species and highlight the promises and remaining challenges of using protein homology for small molecule target identification. Availability and implementation: Homology-based predictions can be tested on our website http://www.swisstargetprediction.ch . Contact: david.gfeller@unil.ch or vincent.zoete@isb-sib.ch . Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

35

Unknown

ESPRESSO: taking into account assessment errors on outcome and exposures in power analysis for association studies (2015)

Gaye, A., Burton, T. W. Y., Burton, P. R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Very large studies are required to provide sufficiently big sample sizes for adequately powered association analyses. This can be an expensive undertaking and it is important that an accurate sample size is identified. For more realistic sample size calculation and power analysis, the impact of unmeasured aetiological determinants and the quality of measurement of both outcome and explanatory variables should be taken into account. Conventional methods to analyse power use closed-form solutions that are not flexible enough to cater for all of these elements easily. They often result in a potentially substantial overestimation of the actual power. Results: In this article, we describe the Estimating Sample-size and Power in R by Exploring Simulated Study Outcomes tool that allows assessment errors in power calculation under various biomedical scenarios to be incorporated. We also report a real world analysis where we used this tool to answer an important strategic question for an existing cohort. Availability and implementation: The software is available for online calculation and downloads at http://espresso-research.org . The code is freely available at https://github.com/ESPRESSO-research . Contact: louqman@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

36

Unknown

Accurate prediction of RNA nucleotide interactions with backbone k-tree model (2015)

Ding, L., Xue, X., La; Marca, S., Mohebbi, M., Samad, A., Malmberg, R. L., Cai, L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Given the importance of non-coding RNAs to cellular regulatory functions, it would be highly desirable to have accurate computational prediction of RNA 3D structure, a task which remains challenging. Even for a short RNA sequence, the space of tertiary conformations is immense; existing methods to identify native-like conformations mostly resort to random sampling of conformations to achieve computational feasibility. However, native conformations may not be examined and prediction accuracy may be compromised due to sampling. State-of-the-art methods have yet to deliver satisfactory predictions for RNAs of length beyond 50 nucleotides. Results: This paper presents a method to tackle a key step in the RNA 3D structure prediction problem, the prediction of the nucleotide interactions that constitute the desired 3D structure. The research is based on a novel graph model, called a backbone k-tree , to tightly constrain the nucleotide interaction relationships considered for RNA 3D structures. It is shown that the new model makes it possible to efficiently predict the optimal set of nucleotide interactions (including the non-canonical interactions in all recently revealed families) from the query sequence along with known or predicted canonical basepairs. The preliminary results indicate that in most cases the new method can predict with a high accuracy the nucleotide interactions that constitute the 3D structure of the query sequence. It thus provides a useful tool for the accurate prediction of RNA 3D structure. Availability and Implementation: The source package for BkTree is available at http://rna-informatics.uga.edu/index.php?f=software&p=BkTree . Contact: lding@uga.edu or cai@cs.uga.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

37

Unknown

StructureFold: genome-wide RNA secondary structure mapping and reconstruction in vivo (2015)

Tang, Y., Bouvier, E., Kwok, C. K., Ding, Y., Nekrutenko, A., Bevilacqua, P. C., Assmann, S. M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: RNAs fold into complex structures that are integral to the diverse mechanisms underlying RNA regulation of gene expression. Recent development of transcriptome-wide RNA structure profiling through the application of structure-probing enzymes or chemicals combined with high-throughput sequencing has opened a new field that greatly expands the amount of in vitro and in vivo RNA structural information available. The resultant datasets provide the opportunity to investigate RNA structural information on a global scale. However, the analysis of high-throughput RNA structure profiling data requires considerable computational effort and expertise. Results: We present a new platform, StructureFold, that provides an integrated computational solution designed specifically for large-scale RNA structure mapping and reconstruction across any transcriptome. StructureFold automates the processing and analysis of raw high-throughput RNA structure profiling data, allowing the seamless incorporation of wet-bench structural information from chemical probes and/or ribonucleases to restrain RNA secondary structure prediction via the RNAstructure and ViennaRNA package algorithms. StructureFold performs reads mapping and alignment, normalization and reactivity derivation, and RNA structure prediction in a single user-friendly web interface or via local installation. The variation in transcript abundance and length that prevails in living cells and consequently causes variation in the counts of structure-probing events between transcripts is accounted for. Accordingly, StructureFold is applicable to RNA structural profiling data obtained in vivo as well as to in vitro or in silico datasets. StructureFold is deployed via the Galaxy platform. Availability and Implementation: StructureFold is freely available as a component of Galaxy available at: https://usegalaxy.org/ . Contact: yxt148@psu.edu or sma3@psu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

38

Unknown

MetaSV: an accurate and integrative structural-variant caller for next generation sequencing (2015)

Mohiyuddin, M., Mu, J. C., Li, J., Bani Asadi, N., Gerstein, M. B., Abyzov, A., Wong, W. H., Lam, H. Y. K.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : Structural variations (SVs) are large genomic rearrangements that vary significantly in size, making them challenging to detect with the relatively short reads from next-generation sequencing (NGS). Different SV detection methods have been developed; however, each is limited to specific kinds of SVs with varying accuracy and resolution. Previous works have attempted to combine different methods, but they still suffer from poor accuracy particularly for insertions. We propose MetaSV, an integrated SV caller which leverages multiple orthogonal SV signals for high accuracy and resolution. MetaSV proceeds by merging SVs from multiple tools for all types of SVs. It also analyzes soft-clipped reads from alignment to detect insertions accurately since existing tools underestimate insertion SVs. Local assembly in combination with dynamic programming is used to improve breakpoint resolution. Paired-end and coverage information is used to predict SV genotypes. Using simulation and experimental data, we demonstrate the effectiveness of MetaSV across various SV types and sizes. Availability and implementation: Code in Python is at http://bioinform.github.io/metasv/ . Contact: rd@bina.com Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

39

Unknown

PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels (2015)

Choi, Y., Chan, A. P.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : We present a web server to predict the functional effect of single or multiple amino acid substitutions, insertions and deletions using the prediction tool PROVEAN. The server provides rapid analysis of protein variants from any organisms, and also supports high-throughput analysis for human and mouse variants at both the genomic and protein levels. Availability and implementation : The web server is freely available and open to all users with no login requirements at http://provean.jcvi.org . Contact: achan@jcvi.org Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

40

Unknown

GeneTIER: prioritization of candidate disease genes using tissue-specific gene expression profiles (2015)

Antanaviciute, A., Daly, C., Crinnion, L. A., Markham, A. F., Watson, C. M., Bonthron, D. T., Carr, I. M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: In attempts to determine the genetic causes of human disease, researchers are often faced with a large number of candidate genes. Linkage studies can point to a genomic region containing hundreds of genes, while the high-throughput sequencing approach will often identify a great number of non-synonymous genetic variants. Since systematic experimental verification of each such candidate gene is not feasible, a method is needed to decide which genes are worth investigating further. Computational gene prioritization presents itself as a solution to this problem, systematically analyzing and sorting each gene from the most to least likely to be the disease-causing gene, in a fraction of the time it would take a researcher to perform such queries manually. Results: Here, we present Gene TIssue Expression Ranker (GeneTIER), a new web-based application for candidate gene prioritization. GeneTIER replaces knowledge-based inference traditionally used in candidate disease gene prioritization applications with experimental data from tissue-specific gene expression datasets and thus largely overcomes the bias toward the better characterized genes/diseases that commonly afflict other methods. We show that our approach is capable of accurate candidate gene prioritization and illustrate its strengths and weaknesses using case study examples. Availability and Implementation: Freely available on the web at http://dna.leeds.ac.uk/GeneTIER/. Contact: umaan@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

41

Unknown

Stratifying tumour subtypes based on copy number alteration profiles using next-generation sequence data (2015)

Gusnanto, A., Tcherveniakov, P., Shuweihdi, F., Samman, M., Rabbitts, P., Wood, H. M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: The role of personalized medicine and target treatment in the clinical management of cancer patients has become increasingly important in recent years. This has made the task of precise histological substratification of cancers crucial. Increasingly, genomic data are being seen as a valuable classifier. Specifically, copy number alteration (CNA) profiles generated by next-generation sequencing (NGS) can become a determinant for tumours subtyping. The principle purpose of this study is to devise a model with good prediction capability for the tumours histological subtypes as a function of both the patients covariates and their genome-wide CNA profiles from NGS data. Results: We investigate a logistic regression for modelling tumour histological subtypes as a function of the patients’ covariates and their CNA profiles, in a mixed model framework. The covariates, such as age and gender, are considered as fixed predictors and the genome-wide CNA profiles are considered as random predictors. We illustrate the application of this model in lung and oral cancer datasets, and the results indicate that the tumour histological subtypes can be modelled with a good fit. Our cross-validation indicates that the logistic regression exhibits the best prediction relative to other classification methods we considered in this study. The model also exhibits the best agreement in the prediction between smooth-segmented and circular binary-segmented CNA profiles. Availability and implementation: An R package to run a logistic regression is available in http://www1.maths.leeds.ac.uk/~arief/R/CNALR/ . Contact: a.gusnanto@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

42

Unknown

MetaMapR: pathway independent metabolomic network analysis incorporating unknowns (2015)

Grapov, D., Wanichthanarak, K., Fiehn, O.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : Metabolic network mapping is a widely used approach for integration of metabolomic experimental results with biological domain knowledge. However, current approaches can be limited by biochemical domain or pathway knowledge which results in sparse disconnected graphs for real world metabolomic experiments. MetaMapR integrates enzymatic transformations with metabolite structural similarity, mass spectral similarity and empirical associations to generate richly connected metabolic networks. This open source, web-based or desktop software, written in the R programming language, leverages KEGG and PubChem databases to derive associations between metabolites even in cases where biochemical domain or molecular annotations are unknown. Network calculation is enhanced through an interface to the Chemical Translation System, which allows metabolite identifier translation between 〉200 common biochemical databases. Analysis results are presented as interactive visualizations or can be exported as high-quality graphics and numerical tables which can be imported into common network analysis and visualization tools. Availability and Implementation: Freely available at http://dgrapov.github.io/MetaMapR/ . Requires R and a modern web browser. Installation instructions, tutorials and application examples are available at http://dgrapov.github.io/MetaMapR/ . Contact: ofiehn@ucdavis.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

43

Unknown

Cellular phenotype database: a repository for systems microscopy data (2015)

Kirsanova, C., Brazma, A., Rustici, G., Sarkans, U.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: The Cellular Phenotype Database (CPD) is a repository for data derived from high-throughput systems microscopy studies. The aims of this resource are: (i) to provide easy access to cellular phenotype and molecular localization data for the broader research community; (ii) to facilitate integration of independent phenotypic studies by means of data aggregation techniques, including use of an ontology and (iii) to facilitate development of analytical methods in this field. Results: In this article we present CPD, its data structure and user interface, propose a minimal set of information describing RNA interference experiments, and suggest a generic schema for management and aggregation of outputs from phenotypic or molecular localization experiments. The database has a flexible structure for management of data from heterogeneous sources of systems microscopy experimental outputs generated by a variety of protocols and technologies and can be queried by gene, reagent, gene attribute, study keywords, phenotype or ontology terms. Availability and implementation: CPD is developed as part of the Systems Microscopy Network of Excellence and is accessible at http://www.ebi.ac.uk/fg/sym . Contact: jes@ebi.ac.uk or ugis@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

44

Unknown

PDIviz: analysis and visualization of protein-DNA binding interfaces (2015)

Ribeiro, J., Melo, F., Schuller, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : Specific recognition of DNA by proteins is a crucial step of many biological processes. PDIviz is a plugin for the PyMOL molecular visualization system that analyzes protein–DNA binding interfaces by comparing the solvent accessible surface area of the complex against the free protein and free DNA. The plugin provides three distinct three-dimensional visualization modes to highlight interactions with DNA bases and backbone, major and minor groove, and with atoms of different pharmacophoric type (hydrogen bond donors/acceptors, hydrophobic and thymine methyl). Each mode comes in three styles to focus the visual analysis on the protein or DNA side of the interface, or on the nucleotide sequence. PDIviz allows for the generation of publication quality images, all calculated data can be written to disk, and a command line interface is provided for automating tasks. The plugin may be helpful for the detailed identification of regions involved in DNA base and shape readout, and can be particularly useful in rapidly pinpointing the overall mode of interaction. Availability and implementation: Freely available at http://melolab.org/pdiviz/ as a PyMOL plugin. Tested with incentive, educational, and open source versions of PyMOL on Windows, Mac and Linux systems. Contact: aschueller@bio.puc.cl Supplementary Information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

45

Unknown

Likelihood-based complex trait association testing for arbitrary depth sequencing data (2015)

Yan, S., Yuan, S., Xu, Z., Zhang, B., Zhang, B., Kang, G., Byrnes, A., Li, Y.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: : In next generation sequencing (NGS)-based genetic studies, researchers typically perform genotype calling first and then apply standard genotype-based methods for association testing. However, such a two-step approach ignores genotype calling uncertainty in the association testing step and may incur power loss and/or inflated type-I error. In the recent literature, a few robust and efficient likelihood based methods including both likelihood ratio test (LRT) and score test have been proposed to carry out association testing without intermediate genotype calling. These methods take genotype calling uncertainty into account by directly incorporating genotype likelihood function (GLF) of NGS data into association analysis. However, existing LRT methods are computationally demanding or do not allow covariate adjustment; while existing score tests are not applicable to markers with low minor allele frequency (MAF). We provide an LRT allowing flexible covariate adjustment, develop a statistically more powerful score test and propose a combination strategy (UNC combo) to leverage the advantages of both tests. We have carried out extensive simulations to evaluate the performance of our proposed LRT and score test. Simulations and real data analysis demonstrate the advantages of our proposed combination strategy: it offers a satisfactory trade-off in terms of computational efficiency, applicability (accommodating both common variants and variants with low MAF) and statistical power, particularly for the analysis of quantitative trait where the power gain can be up to ~60% when the causal variant is of low frequency (MAF 〈 0.01). Availability and implementation : UNC combo and the associated R files, including documentation, examples, are available at http://www.unc.edu/~yunmli/UNCcombo/ Contact: yunli@med.unc.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

46

Unknown

IMSEQ--a fast and error aware approach to immunogenetic sequence analysis (2015)

Kuchenbecker, L., Nienen, M., Hecht, J., Neumann, A. U., Babel, N., Reinert, K., Robinson, P. N.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation : Recombined T- and B-cell receptor repertoires are increasingly being studied using next generation sequencing (NGS) in order to interrogate the repertoire composition as well as changes in the distribution of receptor clones under different physiological and disease states. This type of analysis requires efficient and unambiguous clonotype assignment to a large number of NGS read sequences, including the identification of the incorporated V and J gene segments and the CDR3 sequence. Current tools have deficits with respect to performance, accuracy and documentation of their underlying algorithms and usage. Results : We present IMSEQ, a method to derive clonotype repertoires from NGS data with sophisticated routines for handling errors stemming from PCR and sequencing artefacts. The application can handle different kinds of input data originating from single- or paired-end sequencing in different configurations and is generic regarding the species and gene of interest. We have carefully evaluated our method with simulated and real world data and show that IMSEQ is superior to other tools with respect to its clonotyping as well as standalone error correction and runtime performance. Availability and implementation: IMSEQ was implemented in C++ using the SeqAn library for efficient sequence analysis. It is freely available under the GPLv2 open source license and can be downloaded at www.imtools.org . Supplementary information : Supplementary data are available at Bioinformatics online. Contact: lkuchenb@inf.fu-berlin.de or peter.robinson@charite.de

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

47

Unknown

SpeeDB: fast structural protein searches (2015)

Robillard, D. E., Mpangase, P. T., Hazelhurst, S., Dehne, F.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: Interactions between amino acids are important determinants of the structure, stability and function of proteins. Several tools have been developed for the identification and analysis of such interactions in proteins based on the extensive studies carried out on high-resolution structures from Protein Data Bank (PDB). Although these tools allow users to identify and analyze interactions, analysis can only be performed on one structure at a time. This makes it difficult and time consuming to study the significance of these interactions on a large scale. Results: SpeeDB is a web-based tool for the identification of protein structures based on structural properties. SpeeDB queries are executed on all structures in the PDB at once, quickly enough for interactive use. SpeeDB includes standard queries based on published criteria for identifying various structures: disulphide bonds, catalytic triads and aromatic–aromatic, sulphur–aromatic, cation– and ionic interactions. Users can also construct custom queries in the user interface without any programming. Results can be downloaded in a Comma Separated Value (CSV) format for further analysis with other tools. Case studies presented in this article demonstrate how SpeeDB can be used to answer various biological questions. Analysis of human proteases revealed that disulphide bonds are the predominant type of interaction and are located close to the active site, where they promote substrate specificity. When comparing the two homologous G protein-coupled receptors and the two protein kinase paralogs analyzed, the differences in the types of interactions responsible for stability accounts for the differences in specificity and functionality of the structures. Availability and implementation: SpeeDB is available at http://www.parallelcomputing.ca as a web service. Contact: d@drobilla.net Supplementary Information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

48

Unknown

Seq2pathway: an R/Bioconductor package for pathway analysis of next-generation sequencing data (2015)

Wang, B., Cunningham, J. M., (Holly) Yang, X.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: : Seq2pathway is an R/Python wrapper for pathway (or functional gene-set) analysis of genomic loci, adapted for advances in genome research. Seq2pathway associates the biological significance of genomic loci with their target transcripts and then summarizes the quantified values on the gene-level into pathway scores. It is designed to isolate systematic disturbances and common biological underpinnings from next-generation sequencing (NGS) data. Seq2pathway offers Bioconductor users enhanced capability in discovering collective pathway effects caused by both coding genes and cis-regulation of non-coding elements. Availability and implementation: The package is freely available at http://www.bioconductor.org/packages/release/bioc/html/seq2pathway.html . Contact : xyang2@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

49

Unknown

agplus: a rapid and flexible tool for aggregation plots (2015)

Maehara, K., Ohkawa, Y.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: : Aggregation plots are frequently used to evaluate signal distributions at user-interested points in ChIP-Seq data analysis. agplus, a new and simple command-line tool, enables rapid and flexible generation of text tables tailored for aggregation plots from which users can easily design multiple groups based on user-definitions such as regulatory regions or transcription initiation sites. Availability and Implementation: This software is implemented in Ruby, supported on Linux and Mac OSX, and freely available at http://github.com/kazumits/agplus Contact: yohkawa@epigenetics.med.kyushu-u.ac.jp

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

50

Unknown

CAPRI: efficient inference of cancer progression models from cross-sectional data (2015)

Ramazzotti, D., Caravagna, G., Olde Loohuis, L., Graudenzi, A., Korsunsky, I., Mauri, G., Antoniotti, M., Mishra, B.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: : We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems. Motivation: Several cancer-related genomic data have become available (e.g. The Cancer Genome Atlas , TCGA) typically involving hundreds of patients. At present, most of these data are aggregated in a cross-sectional fashion providing all measurements at the time of diagnosis. Our goal is to infer cancer ‘progression’ models from such data. These models are represented as directed acyclic graphs (DAGs) of collections of ‘selectivity’ relations, where a mutation in a gene A ‘selects’ for a later mutation in a gene B. Gaining insight into the structure of such progressions has the potential to improve both the stratification of patients and personalized therapy choices. Results: The CAPRI algorithm relies on a scoring method based on a probabilistic theory developed by Suppes, coupled with bootstrap and maximum likelihood inference. The resulting algorithm is efficient, achieves high accuracy and has good complexity, also, in terms of convergence properties. CAPRI performs especially well in the presence of noise in the data, and with limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly reconstructs different types of confluent trajectories despite irregularities in the data. We also report on an ongoing investigation using CAPRI to study atypical Chronic Myeloid Leukemia , in which we uncovered non trivial selectivity relations and exclusivity patterns among key genomic events. Availability and implementation: CAPRI is part of the TRanslational ONCOlogy R package and is freely available on the web at: http://bimib.disco.unimib.it/index.php/Tronco Contact: daniele.ramazzotti@disco.unimib.it Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

51

Unknown

Trans-species learning of cellular signaling systems with bimodal deep belief networks (2015)

Chen, L., Cai, C., Chen, V., Lu, X.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: Model organisms play critical roles in biomedical research of human diseases and drug development. An imperative task is to translate information/knowledge acquired from model organisms to humans. In this study, we address a trans-species learning problem: predicting human cell responses to diverse stimuli, based on the responses of rat cells treated with the same stimuli. Results: We hypothesized that rat and human cells share a common signal-encoding mechanism but employ different proteins to transmit signals, and we developed a bimodal deep belief network and a semi-restricted bimodal deep belief network to represent the common encoding mechanism and perform trans-species learning. These ‘deep learning’ models include hierarchically organized latent variables capable of capturing the statistical structures in the observed proteomic data in a distributed fashion. The results show that the models significantly outperform two current state-of-the-art classification algorithms. Our study demonstrated the potential of using deep hierarchical models to simulate cellular signaling systems. Availability and implementation: The software is available at the following URL: http://pubreview.dbmi.pitt.edu/TransSpeciesDeepLearning/ . The data are available through SBV IMPROVER website, https://www.sbvimprover.com/challenge-2/overview , upon publication of the report by the organizers. Contact : xinghua@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

52

Unknown

GEO2Enrichr: browser extension and server app to extract gene sets from GEO and analyze them for biological functions (2015)

Gundersen, G. W., Jones, M. R., Rouillard, A. D., Kou, Y., Monteiro, C. D., Feldmann, A. S., Hu, K. S., Ma'ayan, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: Identification of differentially expressed genes is an important step in extracting knowledge from gene expression profiling studies. The raw expression data from microarray and other high-throughput technologies is deposited into the Gene Expression Omnibus (GEO) and served as Simple Omnibus Format in Text (SOFT) files. However, to extract and analyze differentially expressed genes from GEO requires significant computational skills. Results: Here we introduce GEO2Enrichr, a browser extension for extracting differentially expressed gene sets from GEO and analyzing those sets with Enrichr, an independent gene set enrichment analysis tool containing over 70 000 annotated gene sets organized into 75 gene-set libraries. GEO2Enrichr adds JavaScript code to GEO web-pages; this code scrapes user selected accession numbers and metadata, and then, with one click, users can submit this information to a web-server application that downloads the SOFT files, parses, cleans and normalizes the data, identifies the differentially expressed genes, and then pipes the resulting gene lists to Enrichr for downstream functional analysis. GEO2Enrichr opens a new avenue for adding functionality to major bioinformatics resources such GEO by integrating tools and resources without the need for a plug-in architecture. Importantly, GEO2Enrichr helps researchers to quickly explore hypotheses with little technical overhead, lowering the barrier of entry for biologists by automating data processing steps needed for knowledge extraction from the major repository GEO. Availability and implementation: GEO2Enrichr is an open source tool, freely available for installation as browser extensions at the Chrome Web Store and FireFox Add-ons. Documentation and a browser independent web application can be found at http://amp.pharm.mssm.edu/g2e/ . Contact: avi.maayan@mssm.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

53

Unknown

DSigDB: drug signatures database for gene set analysis (2015)

Yoo, M., Shin, J., Kim, J., Ryall, K. A., Lee, K., Lee, S., Jeon, M., Kang, J., Tan, A. C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: : We report the creation of Drug Signatures Database (DSigDB), a new gene set resource that relates drugs/compounds and their target genes, for gene set enrichment analysis (GSEA). DSigDB currently holds 22 527 gene sets, consists of 17 389 unique compounds covering 19 531 genes. We also developed an online DSigDB resource that allows users to search, view and download drugs/compounds and gene sets. DSigDB gene sets provide seamless integration to GSEA software for linking gene expressions with drugs/compounds for drug repurposing and translational research. Availability and implementation: DSigDB is freely available for non-commercial use at http://tanlab.ucdenver.edu/DSigDB . Supplementary information: Supplementary data are available at Bioinformatics online. Contact: aikchoon.tan@ucdenver.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

54

Unknown

tmle.npvi: targeted, integrative search of associations between DNA copy number and gene expression, accounting for DNA methylation (2015)

Chambaz, A., Neuvial, P.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: : We describe the implementation of the method introduced by Chambaz et al. in 2012. We also demonstrate its genome-wide application to the integrative search of new regions with strong association between DNA copy number and gene expression accounting for DNA methylation in breast cancers. Availability and implementation: An open-source R package tmle.npvi is available from CRAN ( http://cran.r-project.org/ ). Contact: pierre.neuvial@genopole.cnrs.fr

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

55

Unknown

A novel essential domain perspective for exploring gene essentiality (2015)

Lu, Y., Lu, Y., Deng, J., Peng, H., Lu, H., Lu, L. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: Genes with indispensable functions are identified as essential; however, the traditional gene-level studies of essentiality have several limitations. In this study, we characterized gene essentiality from a new perspective of protein domains, the independent structural or functional units of a polypeptide chain. Results: To identify such essential domains, we have developed an Expectation–Maximization (EM) algorithm-based Essential Domain Prediction (EDP) Model. With simulated datasets, the model provided convergent results given different initial values and offered accurate predictions even with noise. We then applied the EDP model to six microbial species and predicted 1879 domains to be essential in at least one species, ranging 10–23% in each species. The predicted essential domains were more conserved than either non-essential domains or essential genes. Comparing essential domains in prokaryotes and eukaryotes revealed an evolutionary distance consistent with that inferred from ribosomal RNA. When utilizing these essential domains to reproduce the annotation of essential genes, we received accurate results that suggest protein domains are more basic units for the essentiality of genes. Furthermore, we presented several examples to illustrate how the combination of essential and non-essential domains can lead to genes with divergent essentiality. In summary, we have described the first systematic analysis on gene essentiality on the level of domains. Contact: huilu.bioinfo@gmail.com or Long.Lu@cchmc.org Supplementary Information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

56

Unknown

Epigenomic k-mer dictionaries: shedding light on how sequence composition influences in vivo nucleosome positioning (2015)

Giancarlo, R., Rombo, S. E., Utro, F.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: Information-theoretic and compositional analysis of biological sequences, in terms of k -mer dictionaries, has a well established role in genomic and proteomic studies. Much less so in epigenomics, although the role of k -mers in chromatin organization and nucleosome positioning is particularly relevant. Fundamental questions concerning the informational content and compositional structure of nucleosome favouring and disfavoring sequences with respect to their basic building blocks still remain open. Results: We present the first analysis on the role of k -mers in the composition of nucleosome enriched and depleted genomic regions (NER and NDR for short) that is: (i) exhaustive and within the bounds dictated by the information-theoretic content of the sample sets we use and (ii) informative for comparative epigenomics. We analize four different organisms and we propose a paradigmatic formalization of k -mer dictionaries, providing two different and complementary views of the k -mers involved in NER and NDR. The first extends well known studies in this area, its comparative nature being its major merit. The second, very novel, brings to light the rich variety of k -mers involved in influencing nucleosome positioning, for which an initial classification in terms of clusters is also provided. Although such a classification offers many insights, the following deserves to be singled-out: short poly(dA:dT) tracts are reported in the literature as fundamental for nucleosome depletion, however a global quantitative look reveals that their role is much less prominent than one would expect based on previous studies. Availability and implementation: Dictionaries, clusters and Supplementary Material are available online at http://math.unipa.it/rombo/epigenomics/ . Contact: simona.rombo@unipa.it Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

57

Unknown

Repeat- and error-aware comparison of deletions (2015)

Wittler, R., Marschall, T., Schonhuth, A., Makinen, V.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: The number of reported genetic variants is rapidly growing, empowered by ever faster accumulation of next-generation sequencing data. A major issue is comparability. Standards that address the combined problem of inaccurately predicted breakpoints and repeat-induced ambiguities are missing. This decisively lowers the quality of ‘consensus’ callsets and hampers the removal of duplicate entries in variant databases, which can have deleterious effects in downstream analyses. Results: We introduce a sound framework for comparison of deletions that captures both tool-induced inaccuracies and repeat-induced ambiguities. We present a maximum matching algorithm that outputs virtual duplicates among two sets of predictions/annotations. We demonstrate that our approach is clearly superior over ad hoc criteria, like overlap, and that it can reduce the redundancy among callsets substantially. We also identify large amounts of duplicate entries in the Database of Genomic Variants, which points out the immediate relevance of our approach. Availability and implementation: Implementation is open source and available from https://bitbucket.org/readdi/readdi Contact: roland.wittler@uni-bielefeld.de or t.marschall@mpi-inf.mpg.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

58

Unknown

Bayesian mixture analysis for metagenomic community profiling (2015)

Morfopoulou, S., Plagnol, V.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: Deep sequencing of clinical samples is now an established tool for the detection of infectious pathogens, with direct medical applications. The large amount of data generated produces an opportunity to detect species even at very low levels, provided that computational tools can effectively profile the relevant metagenomic communities. Data interpretation is complicated by the fact that short sequencing reads can match multiple organisms and by the lack of completeness of existing databases, in particular for viral pathogens. Here we present metaMix, a Bayesian mixture model framework for resolving complex metagenomic mixtures. We show that the use of parallel Monte Carlo Markov chains for the exploration of the species space enables the identification of the set of species most likely to contribute to the mixture. Results: We demonstrate the greater accuracy of metaMix compared with relevant methods, particularly for profiling complex communities consisting of several related species. We designed metaMix specifically for the analysis of deep transcriptome sequencing datasets, with a focus on viral pathogen detection; however, the principles are generally applicable to all types of metagenomic mixtures. Availability and implementation: metaMix is implemented as a user friendly R package, freely available on CRAN: http://cran.r-project.org/web/packages/metaMix Contact: sofia.morfopoulou.10@ucl.ac.uk Supplementary information: Supplementary data are available at Bionformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

59

Unknown

Computer vision-based automated peak picking applied to protein NMR spectra (2015)

Klukowski, P., Walczak, M. J., Gonczarek, A., Boudet, J., Wider, G.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: Motivation: A detailed analysis of multidimensional NMR spectra of macromolecules requires the identification of individual resonances (peaks). This task can be tedious and time-consuming and often requires support by experienced users. Automated peak picking algorithms were introduced more than 25 years ago, but there are still major deficiencies/flaws that often prevent complete and error free peak picking of biological macromolecule spectra. The major challenges of automated peak picking algorithms is both the distinction of artifacts from real peaks particularly from those with irregular shapes and also picking peaks in spectral regions with overlapping resonances which are very hard to resolve by existing computer algorithms. In both of these cases a visual inspection approach could be more effective than a ‘blind’ algorithm. Results: We present a novel approach using computer vision (CV) methodology which could be better adapted to the problem of peak recognition. After suitable ‘training’ we successfully applied the CV algorithm to spectra of medium-sized soluble proteins up to molecular weights of 26 kDa and to a 130 kDa complex of a tetrameric membrane protein in detergent micelles. Our CV approach outperforms commonly used programs. With suitable training datasets the application of the presented method can be extended to automated peak picking in multidimensional spectra of nucleic acids or carbohydrates and adapted to solid-state NMR spectra. Availability and implementation: CV-Peak Picker is available upon request from the authors. Contact : gsw@mol.biol.ethz.ch ; michal.walczak@mol.biol.ethz.ch ; adam.gonczarek@pwr.edu.pl Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

60

Unknown

PsyGeNET: a knowledge platform on psychiatric disorders and their genes (2015)

Gutierrez-Sacristan, A., Grosdidier, S., Valverde, O., Torrens, M., Bravo, A., Pinero, J., Sanz, F., Furlong, L. I.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-11

Description: : PsyGeNET (Psychiatric disorders and Genes association NETwork) is a knowledge platform for the exploratory analysis of psychiatric diseases and their associated genes. PsyGeNET is composed of a database and a web interface supporting data search, visualization, filtering and sharing. PsyGeNET integrates information from DisGeNET and data extracted from the literature by text mining, which has been curated by domain experts. It currently contains 2642 associations between 1271 genes and 37 psychiatric disease concepts. In its first release, PsyGeNET is focused on three psychiatric disorders: major depression, alcohol and cocaine use disorders. PsyGeNET represents a comprehensive, open access resource for the analysis of the molecular mechanisms underpinning psychiatric disorders and their comorbidities. Availability and implementation: The PysGeNET platform is freely available at http://www.psygenet.org/ . The PsyGeNET database is made available under the Open Database License ( http://opendatacommons.org/licenses/odbl/1.0/ ). Contact: lfurlong@imim.es Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

61

Unknown

Measures for the degree of overlap of gene signatures and applications to TCGA (2015)

Shi, X., Yi, H., Ma, S.

Oxford University Press

In: Briefings in Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-16

Description: For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple independent studies on the same cancer type and outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis data under the Cox model. Lasso, which is representative of a large number of regularization methods, is adopted for generating gene signatures. We examine two families of measures for quantifying the degree of overlap. The first family is based on the Cox-Lasso estimates at the optimal tunings, and the second family is based on estimates across the whole solution paths. Within each family, multiple measures, which describe the overlap from different perspectives, are introduced. The analysis of TCGA (The Cancer Genome Atlas) data on five cancer types shows that the degree of overlap varies across measures, cancer types and types of (epi)genetic measurements. More investigations are needed to better describe and understand the overlaps among gene signatures.

Print ISSN: 1467-5463

Electronic ISSN: 1477-4054

Topics: Biology , Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

62

Unknown

Mango: a bias-correcting ChIA-PET analysis pipeline (2015)

Phanstiel, D. H., Boyle, A. P., Heidari, N., Snyder, M. P.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: Chromatin Interaction Analysis by Paired-End Tag sequencing (ChIA-PET) is an established method for detecting genome-wide looping interactions at high resolution. Current ChIA-PET analysis software packages either fail to correct for non-specific interactions due to genomic proximity or only address a fraction of the steps required for data processing. We present Mango, a complete ChIA-PET data analysis pipeline that provides statistical confidence estimates for interactions and corrects for major sources of bias including differential peak enrichment and genomic proximity. Results: Comparison to the existing software packages, ChIA-PET Tool and ChiaSig revealed that Mango interactions exhibit much better agreement with high-resolution Hi-C data. Importantly, Mango executes all steps required for processing ChIA-PET datasets, whereas ChiaSig only completes 20% of the required steps. Application of Mango to multiple available ChIA-PET datasets permitted the independent rediscovery of known trends in chromatin loops including enrichment of CTCF, RAD21, SMC3 and ZNF143 at the anchor regions of interactions and strong bias for convergent CTCF motifs. Availability and implementation: Mango is open source and distributed through github at https://github.com/dphansti/mango . Contact: mpsnyder@standford.edu Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

63

Unknown

DISTMIX: direct imputation of summary statistics for unmeasured SNPs from mixed ethnicity cohorts (2015)

Lee, D., Bigdeli, T. B., Williamson, V. S., Vladimirov, V. I., Riley, B. P., Fanous, A. H., Bacanu, S.-A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: To increase the signal resolution for large-scale meta-analyses of genome-wide association studies, genotypes at unmeasured single nucleotide polymorphisms (SNPs) are commonly imputed using large multi-ethnic reference panels. However, the ever increasing size and ethnic diversity of both reference panels and cohorts makes genotype imputation computationally challenging for moderately sized computer clusters. Moreover, genotype imputation requires subject-level genetic data, which unlike summary statistics provided by virtually all studies, is not publicly available. While there are much less demanding methods which avoid the genotype imputation step by directly imputing SNP statistics, e.g. D irectly I mputing summary ST atistics (DIST) proposed by our group, their implicit assumptions make them applicable only to ethnically homogeneous cohorts. Results: To decrease computational and access requirements for the analysis of cosmopolitan cohorts, we propose DISTMIX, which extends DIST capabilities to the analysis of mixed ethnicity cohorts. The method uses a relevant reference panel to directly impute unmeasured SNP statistics based only on statistics at measured SNPs and estimated/user-specified ethnic proportions. Simulations show that the proposed method adequately controls the Type I error rates. The 1000 Genomes panel imputation of summary statistics from the ethnically diverse Psychiatric Genetic Consortium Schizophrenia Phase 2 suggests that, when compared to genotype imputation methods, DISTMIX offers comparable imputation accuracy for only a fraction of computational resources. Availability and implementation: DISTMIX software, its reference population data, and usage examples are publicly available at http://code.google.com/p/distmix . Contact: dlee4@vcu.edu Supplementary information: Supplementary Data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

64

Unknown

SoloDel: a probabilistic model for detecting low-frequent somatic deletions from unmatched sequencing data (2015)

Kim, J., Kim, S., Nam, H., Kim, S., Lee, D.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: Finding somatic mutations from massively parallel sequencing data is becoming a standard process in genome-based biomedical studies. There are a number of robust methods developed for detecting somatic single nucleotide variations However, detection of somatic copy number alteration has been substantially less explored and remains vulnerable to frequently raised sampling issues: low frequency in cell population and absence of the matched control samples. Results: We developed a novel computational method SoloDel that accurately classifies low-frequent somatic deletions from germline ones with or without matched control samples. We first constructed a probabilistic, somatic mutation progression model that describes the occurrence and propagation of the event in the cellular lineage of the sample. We then built a Gaussian mixture model to represent the mixed population of somatic and germline deletions. Parameters of the mixture model could be estimated using the expectation-maximization algorithm with the observed distribution of read-depth ratios at the points of discordant-read based initial deletion calls. Combined with conventional structural variation caller, SoloDel greatly increased the accuracy in classifying somatic mutations. Even without control, SoloDel maintained a comparable performance in a wide range of mutated subpopulation size (10–70%). SoloDel could also successfully recall experimentally validated somatic deletions from previously reported neuropsychiatric whole-genome sequencing data. Availability and implementation: Java-based implementation of the method is available at http://sourceforge.net/projects/solodel/ Contact: swkim@yuhs.ac or dhlee@biosoft.kaist.ac.kr Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

65

Unknown

QVZ: lossy compression of quality values (2015)

Malysa, G., Hernaez, M., Ochoa, I., Rao, M., Ganesan, K., Weissman, T.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: Recent advancements in sequencing technology have led to a drastic reduction in the cost of sequencing a genome. This has generated an unprecedented amount of genomic data that must be stored, processed and transmitted. To facilitate this effort, we propose a new lossy compressor for the quality values presented in genomic data files (e.g. FASTQ and SAM files), which comprise roughly half of the storage space (in the uncompressed domain). Lossy compression allows for compression of data beyond its lossless limit. Results: The proposed algorithm QVZ exhibits better rate-distortion performance than the previously proposed algorithms, for several distortion metrics and for the lossless case. Moreover, it allows the user to define any quasi-convex distortion function to be minimized, a feature not supported by the previous algorithms. Finally, we show that QVZ-compressed data exhibit better performance in the genotyping than data compressed with previously proposed algorithms, in the sense that for a similar rate, a genotyping closer to that achieved with the original quality values is obtained. Availability and implementation: QVZ is written in C and can be downloaded from https://github.com/mikelhernaez/qvz . Contact: mhernaez@stanford.edu or gmalysa@stanford.edu or iochoa@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

66

Unknown

A parallel and sensitive software tool for methylation analysis on multicore platforms (2015)

Tarraga, J., Perez, M., Orduna, J. M., Duato, J., Medina, I., Dopazo, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: DNA methylation analysis suffers from very long processing time, as the advent of Next-Generation Sequencers has shifted the bottleneck of genomic studies from the sequencers that obtain the DNA samples to the software that performs the analysis of these samples. The existing software for methylation analysis does not seem to scale efficiently neither with the size of the dataset nor with the length of the reads to be analyzed. As it is expected that the sequencers will provide longer and longer reads in the near future, efficient and scalable methylation software should be developed. Results: We present a new software tool, called HPG-Methyl, which efficiently maps bisulphite sequencing reads on DNA, analyzing DNA methylation. The strategy used by this software consists of leveraging the speed of the Burrows–Wheeler Transform to map a large number of DNA fragments (reads) rapidly, as well as the accuracy of the Smith–Waterman algorithm, which is exclusively employed to deal with the most ambiguous and shortest reads. Experimental results on platforms with Intel multicore processors show that HPG-Methyl significantly outperforms in both execution time and sensitivity state-of-the-art software such as Bismark, BS-Seeker or BSMAP, particularly for long bisulphite reads. Availability and implementation: Software in the form of C libraries and functions, together with instructions to compile and execute this software. Available by sftp to anonymous@clariano.uv.es (password ‘anonymous’). Contact: juan.orduna@uv.es or jdopazo@cipf.es

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

67

Unknown

UniAlign: protein structure alignment meets evolution (2015)

Zhao, C., Sacan, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: During the evolution, functional sites on the surface of the protein as well as the hydrophobic core maintaining the structural integrity are well-conserved. However, available protein structure alignment methods align protein structures based solely on the 3D geometric similarity, limiting their ability to detect functionally relevant correspondences between the residues of the proteins, especially for distantly related homologous proteins. Results: In this article, we propose a new protein pairwise structure alignment algorithm (UniAlign) that incorporates additional evolutionary information captured in the form of sequence similarity, sequence profiles and residue conservation. We define a per-residue score (UniScore) as a weighted sum of these and other features and develop an iterative optimization procedure to search for an alignment with the best overall UniScore. Our extensive experiments on CDD, HOMSTRAD and BAliBASE benchmark datasets show that UniAlign outperforms commonly used structure alignment methods. We further demonstrate UniAlign's ability to develop family-specific models to drastically improve the quality of the alignments. Availability and implementation: UniAlign is available as a web service at: http://sacan.biomed.drexel.edu/unialign Contact: ahmet.sacan@drexel.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

68

Unknown

FourCSeq: analysis of 4C sequencing data (2015)

Klein, F. A., Pakozdi, T., Anders, S., Ghavi-Helm, Y., Furlong, E. E. M., Huber, W.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: Circularized Chromosome Conformation Capture (4C) is a powerful technique for studying the spatial interactions of a specific genomic region called the ‘viewpoint’ with the rest of the genome, both in a single condition or comparing different experimental conditions or cell types. Observed ligation frequencies typically show a strong, regular dependence on genomic distance from the viewpoint, on top of which specific interaction peaks are superimposed. Here, we address the computational task to find these specific peaks and to detect changes between different biological conditions. Results: We model the overall trend of decreasing interaction frequency with genomic distance by fitting a smooth monotonically decreasing function to suitably transformed count data. Based on the fit, z -scores are calculated from the residuals, and high z -scores are interpreted as peaks providing evidence for specific interactions. To compare different conditions, we normalize fragment counts between samples, and call for differential contact frequencies using the statistical method DESeq2 adapted from RNA-Seq analysis. Availability and implementation: A full end-to-end analysis pipeline is implemented in the R package FourCSeq available at www.bioconductor.org . Contact: felix.klein@embl.de or whuber@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

69

Unknown

Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis (2015)

Gibb, S., Strimmer, K.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: Proteomic mass spectrometry analysis is becoming routine in clinical diagnostics, for example to monitor cancer biomarkers using blood samples. However, differential proteomics and identification of peaks relevant for class separation remains challenging. Results: Here, we introduce a simple yet effective approach for identifying differentially expressed proteins using binary discriminant analysis. This approach works by data-adaptive thresholding of protein expression values and subsequent ranking of the dichotomized features using a relative entropy measure. Our framework may be viewed as a generalization of the ‘peak probability contrast’ approach of Tibshirani et al. (2004) and can be applied both in the two-group and the multi-group setting. Our approach is computationally inexpensive and shows in the analysis of a large-scale drug discovery test dataset equivalent prediction accuracy as a random forest. Furthermore, we were able to identify in the analysis of mass spectrometry data from a pancreas cancer study biological relevant and statistically predictive marker peaks unrecognized in the original study. Availability and implementation: The methodology for binary discriminant analysis is implemented in the R package binda, which is freely available under the GNU General Public License (version 3 or later) from CRAN at URL http://cran.r-project.org/web/packages/binda/ . R scripts reproducing all described analyzes are available from the web page http://strimmerlab.org/software/binda/ . Contact: k.strimmer@imperial.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

70

Unknown

The pervasiveness and plasticity of circadian oscillations: the coupled circadian-oscillators framework (2015)

Patel, V. R., Ceglia, N., Zeller, M., Eckel-Mahan, K., Sassone-Corsi, P., Baldi, P.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: Motivation: Circadian oscillations have been observed in animals, plants, fungi and cyanobacteria and play a fundamental role in coordinating the homeostasis and behavior of biological systems. Genetically encoded molecular clocks found in nearly every cell, based on negative transcription/translation feedback loops and involving only a dozen genes, play a central role in maintaining these oscillations. However, high-throughput gene expression experiments reveal that in a typical tissue, a much larger fraction ( ~10% ) of all transcripts oscillate with the day–night cycle and the oscillating species vary with tissue type suggesting that perhaps a much larger fraction of all transcripts, and perhaps also other molecular species, may bear the potential for circadian oscillations. Results: To better quantify the pervasiveness and plasticity of circadian oscillations, we conduct the first large-scale analysis aggregating the results of 18 circadian transcriptomic studies and 10 circadian metabolomic studies conducted in mice using different tissues and under different conditions. We find that over half of protein coding genes in the cell can produce transcripts that are circadian in at least one set of conditions and similarly for measured metabolites. Genetic or environmental perturbations can disrupt existing oscillations by changing their amplitudes and phases, suppressing them or giving rise to novel circadian oscillations. The oscillating species and their oscillations provide a characteristic signature of the physiological state of the corresponding cell/tissue. Molecular networks comprise many oscillator loops that have been sculpted by evolution over two trillion day–night cycles to have intrinsic circadian frequency. These oscillating loops are coupled by shared nodes in a large network of coupled circadian oscillators where the clock genes form a major hub. Cells can program and re-program their circadian repertoire through epigenetic and other mechanisms. Availability and implementation: High-resolution and tissue/condition specific circadian data and networks available at http://circadiomics.igb.uci.edu . Contact: pfbaldi@ics.uci.edu Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

71

Unknown

A Generic Method for the Analysis of a Class of Cache Attacks: A Case Study for AES (2015)

Sava F;, E., Yılmaz, C.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: In this paper, we present a methodology to evaluate the feasibility, effectiveness and complexity of a class of cache-based side-channel attacks. The methodology provides estimates on the lower bound of the required number of observations on the side channel and the number of trials for a successful attack. As a case study, a weak implementation of the Advanced Encryption Standard algorithm is selected to apply the proposed methodology to three different categories of cache-based attacks; namely, access-driven, trace-driven and time-driven attacks. The approach, however, is generic in the sense that it can be utilized in other algorithms that are subject to the micro-architectural side-channel attacks. The adopted approach bases its analysis method partially on the conditional entropy of secret keys given the observations of the intermediate variables in software implementations of cryptographic algorithms via the side channel and explores the extent to which the observations can be exploited in a successful attack. Provided that the intermediate variables are relatively simple functions of the key material and the known inputs or outputs of cryptographic algorithms, a successful attack is theoretically feasible. Our methodology emphasizes the need for an analysis of this leakage through such intermediate variables and demonstrates a systematic way to measure it. The method allows us to explore every attack possibility, estimate the feasibility of an attack, and compare the efficiency and the costs of different attack strategies to determine an optimal level of effective countermeasures.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

72

Unknown

Public-Key Encryption Schemes with Bounded CCA Security and Optimal Ciphertext Length Based on the CDH and HDH Assumptions (2015)

Pereira, M., Dowsley, R., Nascimento, A. C. A., Hanaoka, G.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: In Cramer et al. (2007, Bounded CCA2-Secure Encryption. In Kurosawa, K. (ed.), Advances in Cryptology – ASIACRYPT 2007 , Kuching, Malaysia, December 2–6, Lecture Notes in Computer Science, Vol. 4833, pp. 502–518. Springer, Berlin, Germany) proposed a public-key encryption scheme secure against adversaries with a bounded number of decryption queries based on the decisional Diffie–Hellman problem. In this paper, we show that the same result can be obtained based on weaker computational assumptions, namely: the computational Diffie–Hellman and the hashed Diffie–Hellman assumptions.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

73

Unknown

Generic Construction of Certificate-Based Encryption from Certificateless Encryption Revisited (2015)

Gao, W., Wang, G., Wang, X., Chen, K.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: Certificateless public key encryption (CLE) and certificate-based encryption (CBE) are motivated to simultaneously solve the heavy certificate management problem inherent in the traditional public key encryption (PKE) and the key escrow problem inherent in the identity-based encryption (IBE). Al-Riyami and Paterson proposed a general conversion from CLE to CBE, which is neat and natural. Kang and Park pointed out a flaw in their security proof. Wu et al. proposed another generic conversion from CLE to CBE which additionally involves collision resistant hash functions. It remains an open problem whether the generic conversion due to Al-Riyami and Paterson is provably secure or not. We are motivated to solve this open problem. Our basic idea is to enhance Type II adversary's power a little by allowing it to conditionally replace a user's public key. We first formalize a new security model of CLE in this way. Then, we succeed in proving that the Al-Riyami–Paterson generic conversion from CLE to CBE is secure, if the CLE scheme is secure in our new security model. Finally, a concrete provably secure CBE scheme is presented to demonstrate the applicability of our result.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

74

Unknown

Model-based Alignment of Heartbeat Morphology for Enhancing Human Recognition Capability (2015)

Islam, M. S., Alajlan, N.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: Human recognition with heartbeat signal is useful for different applications such as information security, user identification and remote patient monitoring. In this paper, we propose a model-based method for the alignment of heartbeat morphology to enhance the recognition capability. The scale change of different heartbeats of the same individual due to heart rate variability is estimated and inversed to yield better alignment. Recognition capabilities of different alignment methods are analyzed and measured by intra-individual and inter-individual distances of aligned heartbeats. A framework for heartbeat recognition incorporating the model-based alignment method is also presented. We tested the recognition capability of heartbeat morphology by using two different databases. It was found that model-based alignment method was useful to boost the recognition capability of heartbeat morphology. A statistical t -test revealed that the improvement was significant with respect to recognition capabilities of other existing alignment methods. We also used the aligned morphology as a feature, tested the recognition accuracy on both databases and compared the recognition performance to those of four other state-of-the-art features. A large increase in recognition accuracy was obtained especially for a multisession database of heartbeat signals captured from fingers using a handheld ECG device.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

75

Unknown

Toward a Taxonomy of Malware Behaviors (2015)

Gregio, A. R. A., Afonso, V. M., Filho, D. S. F., Geus, P. L. d., Jino, M.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: Malicious code attacks pose a serious threat to the security of information systems, as malware evolved from innocuous conceptual software to advanced and destructive cyber weapons. However, there is still the lack of a comprehensive and useful taxonomy to classify malware according to their behavior, since commonly used names are obsolete and unable to handle the complex and multipurpose currently observed samples. In this article, we present a brief survey on available malware taxonomies, discuss about issues on existing naming schemes and introduce an extensible taxonomy consisting of an initial set of behaviors usually exhibited by malware during an infection. The main goal of our proposed taxonomy is to address the menace of potentially malicious programs based on their observed behaviors, thus aiding in incident response procedures. Finally, we present a case study to evaluate our behavior-centric taxonomy, in which we apply identification patterns extracted from the proposed taxonomy to over 12 thousand known malware samples. The leveraged results show that it is possible to screen malicious programs that exhibit suspicious behaviors, even when they remain undetected by antivirus tools.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

76

Unknown

Annotated Control Flow Graph for Metamorphic Malware Detection (2015)

Alam, S., Traore, I., Sogukpinar, I.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: Metamorphism is a technique that mutates the binary code using different obfuscations and never keeps the same sequence of opcodes in the memory. This stealth technique provides the capability to a malware for evading detection by simple signature-based (such as instruction sequences, byte sequences and string signatures) anti-malware programs. In this paper, we present a new scheme named Annotated Control Flow Graph (ACFG) to efficiently detect such kinds of malware. ACFG is built by annotating CFG of a binary program and is used for graph and pattern matching to analyse and detect metamorphic malware. We also optimize the runtime of malware detection through parallelization and ACFG reduction, maintaining the same accuracy (without ACFG reduction) for malware detection. ACFG proposed in this paper: (i) captures the control flow semantics of a program; (ii) provides a faster matching of ACFGs and can handle malware with smaller CFGs, compared with other such techniques, without compromising the accuracy; (iii) contains more information and hence provides more accuracy than a CFG. Experimental evaluation of the proposed scheme using an existing dataset yields malware detection rate of 98.9% and false positive rate of 4.5%.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

77

Unknown

On Designing Resilient Location-Privacy Obfuscators (2015)

Perazzo, P., Skvortsov, P., Dini, G.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: The success of location-based services is growing together with the diffusion of GPS-equipped smart devices. As a consequence, privacy concerns are raising year by year. Location privacy is becoming a major interest in research and industry world, and many solutions have been proposed for it. One of the simplest and most flexible approaches is obfuscation, in which the precision of location data is artificially degraded before disclosing it. In this paper, we present an obfuscation approach capable of dealing with measurement imprecision, multiple levels of privacy, untrusted servers and adversarial knowledge of the map. We estimate its resistance against statistical-based deobfuscation attacks, and we improve it by means of three techniques, namely extreme vectors , enlarge-and-scale and hybrid vectors .

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

78

Unknown

Password Management: Distribution, Review and Revocation (2015)

Lopriore, L.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: We consider the problem of access privilege management in a classical protection environment featuring subjects attempting to access the protected objects. We express an access privilege in terms of an access right and a privilege level. The privilege level and a protection diagram associated with each given object determine whether a nominal access privilege for this object corresponds to an effective, possibly weaker access privilege, or is revoked. We associate a password system with each object; the password system takes the form of a hierarchical bidimensional one-way chain. A subject possesses a nominal access privilege for a given object if it holds a key that matches one of the passwords in the password system of this object; the protection diagram determines the extent of the corresponding effective access privilege. The resulting protection environment has several interesting properties. A key reduction mechanism allows a subject that holds a key for a given object to distribute keys for weaker access rights at lower privilege levels. A subject that owns a given object can review or revoke the passwords for this object by simply modifying the protection diagram. The memory requirements to represent a protection diagram are negligible; as far as password storage is concerned, space–time trade-offs are possible.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

79

Unknown

Efficient and Fully CCA Secure Conditional Proxy Re-Encryption from Hierarchical Identity-Based Encryption (2015)

Liang, K., Susilo, W., Liu, J. K., Wong, D. S.

Oxford University Press

In: Computer Journal

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: A proxy re-encryption (PRE) allows a data owner to delegate the decryption rights of some encrypted data stored on the cloud without revealing the data to an honest-but-curious cloud service provider (i.e. the PRE proxy). Furthermore, the data owner can offload most of the computational operations to the cloud service provider and hence, using PRE for encrypted cloud data sharing can be very effective even for data owners using limited resource devices (e.g. mobile devices). However, PRE schemes only enables data owners to delegate the decryption rights of all their encrypted data. A more practical notion is a conditional PRE (CPRE) that allows us to specify under what condition the decryption of an encrypted data can be delegated, for example, only sharing all the encrypted files under a directory called ‘public’. In this paper, we provide an affirmative result on the long-standing question of building a full chosen-ciphertext attacks (CCA)-secure CPRE system in the standard model and for the first time, we show that a class of Hierarchical Identity-Based Encryption (HIBE) schemes can be transferred to building a CCA-secure CPRE in the standard model. We also list out some concrete HIBE schemes which fall into this class, e.g., Lewko-Waters HIBE. All existing CCA-secure PRE schemes in the standard model are not conditional while all existing CPRE schemes are either not CCA secure or not in the standard model. By instantiating our generic HIBE-based transformation, we show that an efficient and concrete CPRE scheme which is both CCA secure in the standard model and conditional can be built.

Print ISSN: 0010-4620

Electronic ISSN: 1460-2067

Topics: Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

80

Unknown

Polynomially parsable unification grammars (2015)

Peled, H., Wintner, S.

Oxford University Press

In: Journal of Logic and Computation

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: Unification grammars (UGs) are a grammatical formalism that underlies several contemporary linguistic theories, including lexical-functional grammar and head-driven phrase-structure grammar. UG is an especially attractive formalism because of its expressivity, which facilitates the expression of complex linguistic structures and relations. Formally, UG is Turing-complete, generating the entire class of recursively enumerable languages. This expressivity, however, comes at a price: the universal recognition problem is undecidable for arbitrary unification grammars. We define a constrained version of UG that is equivalent to range concatenation grammar, a formalism that generates exactly the class of languages recognizable in deterministic polynomial time. We thus obtain a constrained unification grammar formalism that guarantees efficient processing.

Print ISSN: 0955-792X

Electronic ISSN: 1465-363X

Topics: Computer Science , Mathematics

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

81

Unknown

The syntactic concept lattice: Another algebraic theory of the context-free languages? (2015)

Clark, A.

Oxford University Press

In: Journal of Logic and Computation

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: The syntactic concept lattice is a residuated lattice associated with a given formal language; it arises naturally as a generalization of the syntactic monoid in the analysis of the distributional structure of the language. In this article we define the syntactic concept lattice and present its basic properties, and its relationship to the universal automaton and the syntactic congruence; we consider several different equivalent definitions, as Galois connections, as maximal factorizations and finally using universal algebra to define it as an object that has a certain universal (terminal) property in the category of complete idempotent semirings that recognize a given language, applying techniques from automata theory to the theory of context-free grammars (CFGs). We conclude that grammars that are minimal, in a certain weak sense, will always have non-terminals that correspond to elements of this lattice, and claim that the syntactic concept lattice provides a natural system of categories for representing the denotations of CFGs.

Print ISSN: 0955-792X

Electronic ISSN: 1465-363X

Topics: Computer Science , Mathematics

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

82

Unknown

A formalization of argumentation schemes for legal case-based reasoning in ASPIC+ (2015)

Prakken, H., Wyner, A., Bench-Capon, T., Atkinson, K.

Oxford University Press

In: Journal of Logic and Computation

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: In this article we offer a formal account of reasoning with legal cases in terms of argumentation schemes. These schemes, and undercutting attacks associated with them, are formalized as defeasible rules of inference within the ASPIC+ framework. We begin by modelling the style of reasoning with cases developed by Aleven and Ashley in the CATO project, which describes cases using factors, and then extend the account to accommodate the dimensions used in Rissland and Ashley's earlier HYPO project. Some additional scope for argumentation is then identified and formalized.

Print ISSN: 0955-792X

Electronic ISSN: 1465-363X

Topics: Computer Science , Mathematics

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

83

Unknown

On deontic action logics based on Boolean algebra (2015)

Trypuz, R., Kulicki, P.

Oxford University Press

In: Journal of Logic and Computation

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: The aim of this article is to provide a metalogical systematization in the area of deontic action logic based on Boolean algebra. Differences among the systems involve two aspects: the level of closedness of a deontic action logic and the possibility of performing no action at all . It is also shown that the existing definitions of obligation in these systems are unacceptable due to their non-intuitive interpretation or paradoxical consequences. As a solution we propose a minimal axiomatic characterization of obligation with an adequate class of models. This article also describes how deontic action logic can be used to answer the questions from the Polish driving license test.

Print ISSN: 0955-792X

Electronic ISSN: 1465-363X

Topics: Computer Science , Mathematics

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

84

Unknown

Blending margins: the modal logic K has nullary unification type (2015)

Jeřabek, E.

Oxford University Press

In: Journal of Logic and Computation

add to mindlist on the mindlist

Details

Publication Date: 2015-09-29

Description: We investigate properties of the formula p -〉 p in the basic modal logic K . We show that K satisfies an infinitary weaker variant of the rule of margins -〉 / , ¬ , and as a consequence, we obtain various negative results about admissibility and unification in K . We describe a complete set of unifiers (i.e. substitutions making the formula provable) of p -〉 p , and use it to establish that K has the worst possible unification type: nullary. In well-behaved transitive modal logics, admissibility and unification can be analysed in terms of projective formulas, introduced by Ghilardi; in particular, projective formulas coincide for these logics with formulas that are admissibly saturated (i.e. derive all their multiple-conclusion admissible consequences) or exact (i.e. axiomatize a theory of a substitution). In contrast, we show that in K , the formula p -〉 p is admissibly saturated, but neither projective nor exact. All our results for K also apply to the basic description logic $$\mathcal{ALC}$$ .

Print ISSN: 0955-792X

Electronic ISSN: 1465-363X

Topics: Computer Science , Mathematics

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

85

Unknown

Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins (2015)

Yang, J., He, B.-J., Jang, R., Zhang, Y., Shen, H.-B.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. 〉3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

86

Unknown

LoopIng: a template-based tool for predicting the structure of protein loops (2015)

Messih, M. A., Lepore, R., Tramontano, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function. Results: We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4–10 residues) and significant enhancements for long loops (11–20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop). Availability and implementation: www.biocomputing.it/looping Contact: anna.tramontano@uniroma1.it Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

87

Unknown

Improving protein fold recognition with hybrid profiles combining sequence and structure evolution (2015)

Ghouzam, Y., Postic, G., de Brevern, A. G., Gelly, J.-C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since protein structure is more conserved than sequence, the inclusion of structural information can improve the detection of remote homology. Results: Here, we present ORION, a new fold recognition method based on the pairwise comparison of hybrid profiles that contain evolutionary information from both protein sequence and structure. Our method uses the 16-state structural alphabet Protein Blocks, which provides an accurate 1D description of protein structure local conformations. ORION systematically outperforms PSI-BLAST and HHsearch on several benchmarks, including target sequences from the modeling competitions CASP8, 9 and 10, and detects ~10% more templates at fold and superfamily SCOP levels. Availability: Software freely available for download at http://www.dsimb.inserm.fr/orion/ . Contact: jean-christophe.gelly@univ-paris-diderot.fr Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

88

Unknown

ARResT/AssignSubsets: a novel application for robust subclassification of chronic lymphocytic leukemia based on B cell receptor IG stereotypy (2015)

Bystry, V., Agathangelidis, A., Bikos, V., Sutton, L. A., Baliakas, P., Hadzidimitriou, A., Stamatopoulos, K., Darzentas, N., also on behalf of ERIC, the European Research Initiative on CLL

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for ~30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. Results: We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution. Availability and implementation: ARResT/AssignSubsets is freely available on the web at http://bat.infspire.org/arrest/assignsubsets/ Contact: nikos.darzentas@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

89

Unknown

stringgaussnet: from differentially expressed genes to semantic and Gaussian networks generation (2015)

Chaplais, E., Garchon, H.-J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Knowledge-based and co-expression networks are two kinds of gene networks that can be currently implemented by sophisticated but distinct tools. We developed stringgaussnet, an R package that integrates both approaches, starting from a list of differentially expressed genes. Contact: henri-jean.garchon@inserm.fr Availability and implementation: Freely available on the web at http://cran.r-project.org/web/packages/stringgaussnet .

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

90

Unknown

TIPR: transcription initiation pattern recognition on a genome scale (2015)

Morton, T., Wong, W.-K., Megraw, M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: The computational identification of gene transcription start sites (TSSs) can provide insights into the regulation and function of genes without performing expensive experiments, particularly in organisms with incomplete annotations. High-resolution general-purpose TSS prediction remains a challenging problem, with little recent progress on the identification and differentiation of TSSs which are arranged in different spatial patterns along the chromosome. Results: In this work, we present the Transcription Initiation Pattern Recognizer (TIPR), a sequence-based machine learning model that identifies TSSs with high accuracy and resolution for multiple spatial distribution patterns along the genome, including broadly distributed TSS patterns that have previously been difficult to characterize. TIPR predicts not only the locations of TSSs but also the expected spatial initiation pattern each TSS will form along the chromosome—a novel capability for TSS prediction algorithms. As spatial initiation patterns are associated with spatiotemporal expression patterns and gene function, this capability has the potential to improve gene annotations and our understanding of the regulation of transcription initiation. The high nucleotide resolution of this model locates TSSs within 10 nucleotides or less on average. Availability and implementation: Model source code is made available online at http://megraw.cgrb.oregonstate.edu/software/TIPR/ . Contact: megrawm@science.oregonstate.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

91

Unknown

Determining conserved metabolic biomarkers from a million database queries (2015)

Kurczy, M. E., Ivanisevic, J., Johnson, C. H., Uritboonthai, W., Hoang, L., Fang, M., Hicks, M., Aldebot, A., Rinehart, D., Mellander, L. J., Tautenhahn, R., Patti, G. J., Spilker, M. E., Benton, H. P., Siuzdak, G.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Metabolite databases provide a unique window into metabolome research allowing the most commonly searched biomarkers to be catalogued. Omic scale metabolite profiling, or metabolomics, is finding increased utility in biomarker discovery largely driven by improvements in analytical technologies and the concurrent developments in bioinformatics. However, the successful translation of biomarkers into clinical or biologically relevant indicators is limited. Results: With the aim of improving the discovery of translatable metabolite biomarkers, we present search analytics for over one million METLIN metabolite database queries. The most common metabolites found in METLIN were cross-correlated against XCMS Online, the widely used cloud-based data processing and pathway analysis platform. Analysis of the METLIN and XCMS common metabolite data has two primary implications: these metabolites, might indicate a conserved metabolic response to stressors and, this data may be used to gauge the relative uniqueness of potential biomarkers. Availability and implementation. METLIN can be accessed by logging on to: https://metlin.scripps.edu Contact: siuzdak@scripps.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

92

Unknown

Optimization of miRNA-seq data preprocessing (2015)

Tam, S., Tsao, M.-S., McPherson, J. D.

Oxford University Press

In: Briefings in Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-20

Description: The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.

Print ISSN: 1467-5463

Electronic ISSN: 1477-4054

Topics: Biology , Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

93

Unknown

Parameter estimation methods for gene circuit modeling from time-series mRNA data: a comparative study (2015)

Fan, M., Kuwahara, H., Wang, X., Wang, S., Gao, X.

Oxford University Press

In: Briefings in Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-20

Description: Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate high-quality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large.

Print ISSN: 1467-5463

Electronic ISSN: 1477-4054

Topics: Biology , Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

94

Unknown

Comparing the evolutionary conservation between human essential genes, human orthologs of mouse essential genes and human housekeeping genes (2015)

Lv, W., Zheng, J., Luan, M., Shi, M., Zhu, H., Zhang, M., Lv, H., Shang, Z., Duan, L., Zhang, R., Jiang, Y.

Oxford University Press

In: Briefings in Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-20

Description: Human housekeeping genes are often confused with essential human genes, and several studies regard both types of genes as having the same level of evolutionary conservation. However, this is not necessarily the case. To clarify this, we compared the differences between human housekeeping genes and essential human genes with respect to four aspects: the evolutionary rate (dN/dS), protein sequence identity, single-nucleotide polymorphism (SNP) density and level of linkage disequilibrium (LD). The results showed that housekeeping genes had lower evolutionary rates, higher sequence identities, lower SNP densities and higher levels of LD compared with essential genes. Together, these findings indicate that housekeeping and essential genes are two distinct types of genes, and that housekeeping genes have a higher level of evolutionary conservation. Therefore, we suggest that researchers should pay careful attention to the distinctions between housekeeping genes and essential genes. Moreover, it is still controversial whether we should substitute human orthologs of mouse essential genes for human essential genes. Therefore, we compared the evolutionary features between human orthologs of mouse essential genes and human housekeeping genes and we got inconsistent results in long-term and short-term evolutionary characteristics implying the irrationality of simply replacing human essential genes with human orthologs of mouse essential genes.

Print ISSN: 1467-5463

Electronic ISSN: 1477-4054

Topics: Biology , Computer Science

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

95

Unknown

PBAP: a pipeline for file processing and quality control of pedigree data with dense genetic markers (2015)

Nato, A. Q., Chapman, N. H., Sohi, H. K., Nguyen, H. D., Brkanac, Z., Wijsman, E. M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Huge genetic datasets with dense marker panels are now common. With the availability of sequence data and recognition of importance of rare variants, smaller studies based on pedigrees are again also common. Pedigree-based samples often start with a dense marker panel, a subset of which may be used for linkage analysis to reduce computational burden and to limit linkage disequilibrium between single-nucleotide polymorphisms (SNPs). Programs attempting to select markers for linkage panels exist but lack flexibility. Results: We developed a pedigree-based analysis pipeline (PBAP) suite of programs geared towards SNPs and sequence data. PBAP performs quality control, marker selection and file preparation. PBAP sets up files for MORGAN, which can handle analyses for small and large pedigrees, typically human, and results can be used with other programs and for downstream analyses. We evaluate and illustrate its features with two real datasets. Availability and implementation: PBAP scripts may be downloaded from http://faculty.washington.edu/wijsman/software.shtml . Contact: wijsman@uw.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

96

Unknown

Identifying kinase dependency in cancer cells by integrating high-throughput drug screening and kinase inhibition data (2015)

Ryall, K. A., Shin, J., Yoo, M., Hinz, T. K., Kim, J., Kang, J., Heasley, L. E., Tan, A. C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Targeted kinase inhibitors have dramatically improved cancer treatment, but kinase dependency for an individual patient or cancer cell can be challenging to predict. Kinase dependency does not always correspond with gene expression and mutation status. High-throughput drug screens are powerful tools for determining kinase dependency, but drug polypharmacology can make results difficult to interpret. Results: We developed Kinase Addiction Ranker (KAR), an algorithm that integrates high-throughput drug screening data, comprehensive kinase inhibition data and gene expression profiles to identify kinase dependency in cancer cells. We applied KAR to predict kinase dependency of 21 lung cancer cell lines and 151 leukemia patient samples using published datasets. We experimentally validated KAR predictions of FGFR and MTOR dependence in lung cancer cell line H1581, showing synergistic reduction in proliferation after combining ponatinib and AZD8055. Availability and implementation: KAR can be downloaded as a Python function or a MATLAB script along with example inputs and outputs at: http://tanlab.ucdenver.edu/KAR/ . Contact: aikchoon.tan@ucdenver.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

97

Unknown

OVA: integrating molecular and physical phenotype data from multiple biomedical domain ontologies with variant filtering for enhanced variant prioritization (2015)

Antanaviciute, A., Watson, C. M., Harrison, S. M., Lascelles, C., Crinnion, L., Markham, A. F., Bonthron, D. T., Carr, I. M.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Exome sequencing has become a de facto standard method for Mendelian disease gene discovery in recent years, yet identifying disease-causing mutations among thousands of candidate variants remains a non-trivial task. Results: Here we describe a new variant prioritization tool, OVA (ontology variant analysis), in which user-provided phenotypic information is exploited to infer deeper biological context. OVA combines a knowledge-based approach with a variant-filtering framework. It reduces the number of candidate variants by considering genotype and predicted effect on protein sequence, and scores the remainder on biological relevance to the query phenotype. We take advantage of several ontologies in order to bridge knowledge across multiple biomedical domains and facilitate computational analysis of annotations pertaining to genes, diseases, phenotypes, tissues and pathways. In this way, OVA combines information regarding molecular and physical phenotypes and integrates both human and model organism data to effectively prioritize variants. By assessing performance on both known and novel disease mutations, we show that OVA performs biologically meaningful candidate variant prioritization and can be more accurate than another recently published candidate variant prioritization tool. Availability and implementation: OVA is freely accessible at http://dna2.leeds.ac.uk:8080/OVA/index.jsp Supplementary information : Supplementary data are available at Bioinformatics online. Contact: umaan@leeds.ac.uk

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

98

Unknown

Impact of normalization methods on high-throughput screening data with high hit rates and drug testing with dose-response data (2015)

Mpindi, J.-P., Swapnil, P., Dmitrii, B., Jani, S., Saeed, K., Wennerberg, K., Aittokallio, T., Östling, P., Kallioniemi, O.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Most data analysis tools for high-throughput screening (HTS) seek to uncover interesting hits for further analysis. They typically assume a low hit rate per plate. Hit rates can be dramatically higher in secondary screening, RNAi screening and in drug sensitivity testing using biologically active drugs. In particular, drug sensitivity testing on primary cells is often based on dose–response experiments, which pose a more stringent requirement for data quality and for intra- and inter-plate variation. Here, we compared common plate normalization and noise-reduction methods, including the B -score and the Loess a local polynomial fit method under high hit-rate scenarios of drug sensitivity testing. We generated simulated 384-well plate HTS datasets, each with 71 plates having a range of 20 (5%) to 160 (42%) hits per plate, with controls placed either at the edge of the plates or in a scattered configuration. Results: We identified 20% (77/384) as the critical hit-rate after which the normalizations started to perform poorly. Results from real drug testing experiments supported this estimation. In particular, the B -score resulted in incorrect normalization of high hit-rate plates, leading to poor data quality, which could be attributed to its dependency on the median polish algorithm. We conclude that a combination of a scattered layout of controls per plate and normalization using a polynomial least squares fit method, such as Loess helps to reduce column, row and edge effects in HTS experiments with high hit-rates and is optimal for generating accurate dose–response curves. Contact: john.mpindi@helsinki.fi Availability and implementation, Supplementary information: R code and Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

99

Unknown

ERC analysis: web-based inference of gene function via evolutionary rate covariation (2015)

Wolfe, N. W., Clark, N. L.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: : The recent explosion of comparative genomics data presents an unprecedented opportunity to construct gene networks via the evolutionary rate covariation (ERC) signature. ERC is used to identify genes that experienced similar evolutionary histories, and thereby draws functional associations between them. The ERC Analysis website allows researchers to exploit genome-wide datasets to infer novel genes in any biological function and to explore deep evolutionary connections between distinct pathways and complexes. The website provides five analytical methods, graphical output, statistical support and access to an increasing number of taxonomic groups. Availability and implementation: Analyses and data at http://csb.pitt.edu/erc_analysis/ Contact: nclark@pitt.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

100

Unknown

Correcting systematic bias and instrument measurement drift with mzRefinery (2015)

Gibbons, B. C., Chambers, M. C., Monroe, M. E., Tabb, D. L., Payne, S. H.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Systematic bias in mass measurement adversely affects data quality and negates the advantages of high precision instruments. Results : We introduce the mzRefinery tool for calibration of mass spectrometry data files. Using confident peptide spectrum matches, three different calibration methods are explored and the optimal transform function is chosen. After calibration, systematic bias is removed and the mass measurement errors are centered at 0 ppm. Because it is part of the ProteoWizard package, mzRefinery can read and write a wide variety of file formats. Availability and implementation: The mzRefinery tool is part of msConvert, available with the ProteoWizard open source package at http://proteowizard.sourceforge.net/ Contact: samuel.payne@pnnl.gov Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext