ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

AVIA v2.0: annotation, visualization and impact analysis of genomic variants and genes (2015)

Vuong, H., Che, A., Ravichandran, S., Luke, B. T., Collins, J. R., Mudunuri, U. S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : As sequencing becomes cheaper and more widely available, there is a greater need to quickly and effectively analyze large-scale genomic data. While the functionality of AVIA v1.0, whose implementation was based on ANNOVAR, was comparable with other annotation web servers, AVIA v2.0 represents an enhanced web-based server that extends genomic annotations to cell-specific transcripts and protein-level functional annotations. With AVIA’s improved interface, users can better visualize their data, perform comprehensive searches and categorize both coding and non-coding variants. Availability and implementation : AVIA is freely available through the web at http://avia.abcc.ncifcrf.gov . Contact : Hue.Vuong@fnlcr.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies (2015)

Vuckovic, D., Gasparini, P., Soranzo, N., Iotchkova, V.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: : As new methods for multivariate analysis of genome wide association studies become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance-based method for meta-analysis, generalized to an n -dimensional setting. Availability and implementation: The R package MultiMeta can be downloaded from CRAN. Contact: dragana.vuckovic@burlo.trieste.it ; vi1@sanger.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Gene selection for the reconstruction of stem cell differentiation trees: a linear programming approach (2015)

Ghadie, M. A., Japkowicz, N., Perkins, T. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Stem cell differentiation is largely guided by master transcriptional regulators, but it also depends on the expression of other types of genes, such as cell cycle genes, signaling genes, metabolic genes, trafficking genes, etc. Traditional approaches to understanding gene expression patterns across multiple conditions, such as principal components analysis or K-means clustering, can group cell types based on gene expression, but they do so without knowledge of the differentiation hierarchy. Hierarchical clustering can organize cell types into a tree, but in general this tree is different from the differentiation hierarchy itself. Methods: Given the differentiation hierarchy and gene expression data at each node, we construct a weighted Euclidean distance metric such that the minimum spanning tree with respect to that metric is precisely the given differentiation hierarchy. We provide a set of linear constraints that are provably sufficient for the desired construction and a linear programming approach to identify sparse sets of weights, effectively identifying genes that are most relevant for discriminating different parts of the tree. Results: We apply our method to microarray gene expression data describing 38 cell types in the hematopoiesis hierarchy, constructing a weighted Euclidean metric that uses just 175 genes. However, we find that there are many alternative sets of weights that satisfy the linear constraints. Thus, in the style of random-forest training, we also construct metrics based on random subsets of the genes and compare them to the metric of 175 genes. We then report on the selected genes and their biological functions. Our approach offers a new way to identify genes that may have important roles in stem cell differentiation. Contact: tperkins@ohri.ca Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Applying stability selection to consistently estimate sparse principal components in high-dimensional molecular data (2015)

Sill, M., Saadati, M., Benner, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Principal component analysis (PCA) is a basic tool often used in bioinformatics for visualization and dimension reduction. However, it is known that PCA may not consistently estimate the true direction of maximal variability in high-dimensional, low sample size settings, which are typical for molecular data. Assuming that the underlying signal is sparse, i.e. that only a fraction of features contribute to a principal component (PC), this estimation consistency can be retained. Most existing sparse PCA methods use L1-penalization, i.e. the lasso , to perform feature selection. But, the lasso is known to lack variable selection consistency in high dimensions and therefore a subsequent interpretation of selected features can give misleading results. Results: We present S4VDPCA, a sparse PCA method that incorporates a subsampling approach, namely stability selection. S4VDPCA can consistently select the truly relevant variables contributing to a sparse PC while also consistently estimate the direction of maximal variability. The performance of the S4VDPCA is assessed in a simulation study and compared to other PCA approaches, as well as to a hypothetical oracle PCA that ‘knows’ the truly relevant features in advance and thus finds optimal, unbiased sparse PCs. S4VDPCA is computationally efficient and performs best in simulations regarding parameter estimation consistency and feature selection consistency. Furthermore, S4VDPCA is applied to a publicly available gene expression data set of medulloblastoma brain tumors. Features contributing to the first two estimated sparse PCs represent genes significantly over-represented in pathways typically deregulated between molecular subgroups of medulloblastoma. Availability and implementation: Software is available at https://github.com/mwsill/s4vdpca . Contact: m.sill@dkfz.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

GS-align for glycan structure alignment and similarity measurement (2015)

Lee, H. S., Jo, S., Mukherjee, S., Park, S.-J., Skolnick, J., Lee, J., Im, W.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Glycans play critical roles in many biological processes, and their structural diversity is key for specific protein-glycan recognition. Comparative structural studies of biological molecules provide useful insight into their biological relationships. However, most computational tools are designed for protein structure, and despite their importance, there is no currently available tool for comparing glycan structures in a sequence order- and size-independent manner. Results: A novel method, GS-align, is developed for glycan structure alignment and similarity measurement. GS-align generates possible alignments between two glycan structures through iterative maximum clique search and fragment superposition. The optimal alignment is then determined by the maximum structural similarity score, GS-score, which is size-independent. Benchmark tests against the Protein Data Bank (PDB) N -linked glycan library and PDB homologous/non-homologous N -glycoprotein sets indicate that GS-align is a robust computational tool to align glycan structures and quantify their structural similarity. GS-align is also applied to template-based glycan structure prediction and monosaccharide substitution matrix generation to illustrate its utility. Availability and implementation: http://www.glycanstructure.org/gsalign . Contact: wonpil@ku.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Analysis of impedance-based cellular growth assays (2015)

Witzel, F., Fritsche-Guenther, R., Lehmann, N., Sieber, A., Bluthgen, N.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-08

Description: Motivation: Impedance-based technologies are advancing methods for measuring proliferation of adherent cell cultures non-invasively and in real time. The analysis of the resulting data has so far been hampered by inappropriate computational methods and the lack of systematic data to evaluate the characteristics of the assay. Results: We used a commercially available system for impedance-based growth measurement (xCELLigence) and compared the reported cell index with data from microscopy. We found that the measured signal correlates linearly with the cell number throughout the time of an experiment with sufficient accuracy in subconfluent cell cultures. The resulting growth curves for various colon cancer cells could be well described with the empirical Richards growth model, which allows for extracting quantitative parameters (such as characteristic cycle times). We found that frequently used readouts like the cell index at a specific time or the area under the growth curve cannot be used to faithfully characterize growth inhibition. We propose to calculate the average growth rate of selected time intervals to accurately estimate time-dependent IC50 values of drugs from growth curves. Contact: nils.bluethgen@charite.de Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Data-dependent bucketing improves reference-free compression of sequencing reads (2015)

Patro, R., Kingsford, C.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: The storage and transmission of high-throughput sequencing data consumes significant resources. As our capacity to produce such data continues to increase, this burden will only grow. One approach to reduce storage and transmission requirements is to compress this sequencing data. Results: We present a novel technique to boost the compression of sequencing that is based on the concept of bucketing similar reads so that they appear nearby in the file. We demonstrate that, by adopting a data-dependent bucketing scheme and employing a number of encoding ideas, we can achieve substantially better compression ratios than existing de novo sequence compression tools, including other bucketing and reordering schemes. Our method, Mince, achieves up to a 45% reduction in file sizes (28% on average) compared with existing state-of-the-art de novo compression schemes. Availability and implementation : Mince is written in C++11, is open source and has been made available under the GPLv3 license. It is available at http://www.cs.cmu.edu/~ckingsf/software/mince . Contact: carlk@cs.cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Identification of C2H2-ZF binding preferences from ChIP-seq data using RCADE (2015)

Najafabadi, H. S., Albu, M., Hughes, T. R.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: : Current methods for motif discovery from chromatin immunoprecipitation followed by sequencing (ChIP-seq) data often identify non-targeted transcription factor (TF) motifs, and are even further limited when peak sequences are similar due to common ancestry rather than common binding factors. The latter aspect particularly affects a large number of proteins from the Cys 2 His 2 zinc finger (C2H2-ZF) class of TFs, as their binding sites are often dominated by endogenous retroelements that have highly similar sequences. Here, we present recognition code-assisted discovery of regulatory elements (RCADE) for motif discovery from C2H2-ZF ChIP-seq data. RCADE combines predictions from a DNA recognition code of C2H2-ZFs with ChIP-seq data to identify models that represent the genuine DNA binding preferences of C2H2-ZF proteins. We show that RCADE is able to identify generalizable binding models even from peaks that are exclusively located within the repeat regions of the genome, where state-of-the-art motif finding approaches largely fail. Availability and implementation: RCADE is available as a webserver and also for download at http://rcade.ccbr.utoronto.ca/ . Supplementary information: Supplementary data are available at Bioinformatics online. Contact: t.hughes@utoronto.ca

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Phylesystem: a git-based data store for community-curated phylogenetic estimates (2015)

Mc; Tavish, E. J., Hinchliff, C. E., Allman, J. F., Brown, J. W., Cranston, K. A., Holder, M. T., Rees, J. A., Smith, S. A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Phylogenetic estimates from published studies can be archived using general platforms like Dryad (Vision, 2010) or TreeBASE (Sanderson et al. , 1994). Such services fulfill a crucial role in ensuring transparency and reproducibility in phylogenetic research. However, digital tree data files often require some editing (e.g. rerooting) to improve the accuracy and reusability of the phylogenetic statements. Furthermore, establishing the mapping between tip labels used in a tree and taxa in a single common taxonomy dramatically improves the ability of other researchers to reuse phylogenetic estimates. As the process of curating a published phylogenetic estimate is not error-free, retaining a full record of the provenance of edits to a tree is crucial for openness, allowing editors to receive credit for their work and making errors introduced during curation easier to correct. Results : Here, we report the development of software infrastructure to support the open curation of phylogenetic data by the community of biologists. The backend of the system provides an interface for the standard database operations of creating, reading, updating and deleting records by making commits to a git repository. The record of the history of edits to a tree is preserved by git’s version control features. Hosting this data store on GitHub ( http://github.com/ ) provides open access to the data store using tools familiar to many developers. We have deployed a server running the ‘phylesystem-api’, which wraps the interactions with git and GitHub. The Open Tree of Life project has also developed and deployed a JavaScript application that uses the phylesystem-api and other web services to enable input and curation of published phylogenetic statements. Availability and implementation : Source code for the web service layer is available at https://github.com/OpenTreeOfLife/phylesystem-api . The data store can be cloned from: https://github.com/OpenTreeOfLife/phylesystem . A web application that uses the phylesystem web services is deployed at http://tree.opentreeoflife.org/curator . Code for that tool is available from https://github.com/OpenTreeOfLife/opentree . Contact : mtholder@gmail.com

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Using combined evidence from replicates to evaluate ChIP-seq peaks (2015)

Jalili, V., Matteucci, M., Masseroli, M., Morelli, M. J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-08-25

Description: Motivation: Chromatin Immunoprecipitation followed by sequencing (ChIP-seq) detects genome-wide DNA–protein interactions and chromatin modifications, returning enriched regions (ERs), usually associated with a significance score. Moderately significant interactions can correspond to true, weak interactions, or to false positives; replicates of a ChIP-seq experiment can provide co-localised evidence to decide between the two cases. We designed a general methodological framework to rigorously combine the evidence of ERs in ChIP-seq replicates, with the option to set a significance threshold on the repeated evidence and a minimum number of samples bearing this evidence. Results : We applied our method to Myc transcription factor ChIP-seq datasets in K562 cells available in the ENCODE project. Using replicates, we could extend up to 3 times the ER number with respect to single-sample analysis with equivalent significance threshold. We validated the ‘rescued’ ERs by checking for the overlap with open chromatin regions and for the enrichment of the motif that Myc binds with strongest affinity; we compared our results with alternative methods (IDR and jMOSAiCS), obtaining more validated peaks than the former and less peaks than latter, but with a better validation. Availability and implementation : An implementation of the proposed method and its source code under GPLv3 license are freely available at http://www.bioinformatics.deib.polimi.it/MSPC/ and http://mspc.codeplex.com/ , respectively. Contact : marco.morelli@iit.it Supplementary information: Supplementary Material are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext