ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (1,569)
  • 2015-2019  (1,569)
  • 1930-1934
  • 2016  (1,569)
  • 1930
  • Bioinformatics  (785)
  • 2184
  • Computer Science  (1,569)
  • Medicine  (1,569)
  • 1
    Publication Date: 2016-07-30
    Description: Motivation: Random sampling of the solution space has emerged as a popular tool to explore and infer properties of large metabolic networks. However, conventional sampling approaches commonly used do not eliminate thermodynamically unfeasible loops. Results: In order to overcome this limitation, we developed an efficient sampling algorithm called loopless Artificially Centered Hit-and-Run on a Box (ll-ACHRB). This algorithm is inspired by the Hit-and-Run on a Box algorithm for uniform sampling from general regions, but employs the directions of choice approach of Artificially Centered Hit-and-Run. A novel strategy for generating feasible warmup points improved both sampling efficiency and mixing. ll-ACHRB shows overall better performance than current strategies to generate feasible flux samples across several models. Furthermore, we demonstrate that a failure to eliminate unfeasible loops greatly affects sample statistics, in particular the correlation structure. Finally, we discuss recommendations for the interpretation of sampling results and possible algorithmic improvements. Availability and implementation: Source code for MATLAB and OCTAVE including examples are freely available for download at http://www.aibn.uq.edu.au/cssb-resources under Software. Optimization runs can use Gurobi Optimizer (by default if available) or GLPK (included with the algorithm). Contact: lars.nielsen@uq.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2016-07-30
    Description: Results: Here, we present a comprehensive analysis on the reproducibility of computational characterization of genomic variants using high throughput sequencing data. We reanalyzed the same datasets twice, using the same tools with the same parameters, where we only altered the order of reads in the input (i.e. FASTQ file). Reshuffling caused the reads from repetitive regions being mapped to different locations in the second alignment, and we observed similar results when we only applied a scatter/gather approach for read mapping—without prior shuffling. Our results show that, some of the most common variation discovery algorithms do not handle the ambiguous read mappings accurately when random locations are selected. In addition, we also observed that even when the exact same alignment is used, the GATK HaplotypeCaller generates slightly different call sets, which we pinpoint to the variant filtration step. We conclude that, algorithms at each step of genomic variation discovery and characterization need to treat ambiguous mappings in a deterministic fashion to ensure full replication of results. Availability and Implementation: Code, scripts and the generated VCF files are available at DOI:10.5281/zenodo.32611. Contact: calkan@cs.bilkent.edu.tr Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2016-07-30
    Description: Motivation: Animals from worms and insects to birds and mammals show distinct body plans; however, the embryonic development of diverse body plans with tissues and organs within is controlled by a surprisingly few signaling pathways. It is well recognized that combinatorial use of and dynamic interactions among signaling pathways follow specific logic to control complex and accurate developmental signaling and patterning, but it remains elusive what such logic is, or even, what it looks like. Results: We have developed a computational model for Drosophila eye development with innovated methods to reveal how interactions among multiple pathways control the dynamically generated hexagonal array of R8 cells. We obtained two novel findings. First, the coupling between the long-range inductive signals produced by the proneural Hh signaling and the short-range restrictive signals produced by the antineural Notch and EGFR signaling is essential for generating accurately spaced R8s. Second, the spatiotemporal orders of key signaling events reveal a robust pattern of lateral inhibition conducted by Ato-coordinated Notch and EGFR signaling to collectively determine R8 patterning. This pattern, stipulating the orders of signaling and comparable to the protocols of communication, may help decipher the well-appreciated but poorly defined logic of developmental signaling. Availability and implementation: The model is available upon request. Contact: hao.zhu@ymail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2016-07-30
    Description: Motivation: The growing amount of regulatory data from the ENCODE, Roadmap Epigenomics and other consortia provides a wealth of opportunities to investigate the functional impact of single nucleotide polymorphisms (SNPs). Yet, given the large number of regulatory datasets, researchers are posed with a challenge of how to efficiently utilize them to interpret the functional impact of SNP sets. Results: We developed the GenomeRunner web server to automate systematic statistical analysis of SNP sets within a regulatory context. Besides defining the functional impact of SNP sets, GenomeRunner implements novel regulatory similarity/differential analyses, and cell type-specific regulatory enrichment analysis. Validated against literature- and disease ontology-based approaches, analysis of 39 disease/trait-associated SNP sets demonstrated that the functional impact of SNP sets corresponds to known disease relationships. We identified a group of autoimmune diseases with SNPs distinctly enriched in the enhancers of T helper cell subpopulations, and demonstrated relevant cell type-specificity of the functional impact of other SNP sets. In summary, we show how systematic analysis of genomic data within a regulatory context can help interpreting the functional impact of SNP sets. Availability and Implementation: GenomeRunner web server is freely available at http://www.integrativegenomics.org/ . Contact: mikhail.dozmorov@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2016-07-30
    Description: Motivation: Adverse drug reactions (ADRs) are a central consideration during drug development. Here we present a machine learning classifier to prioritize ADRs for approved drugs and pre-clinical small-molecule compounds by combining chemical structure (CS) and gene expression (GE) features. The GE data is from the Library of Integrated Network-based Cellular Signatures (LINCS) L1000 dataset that measured changes in GE before and after treatment of human cells with over 20 000 small-molecule compounds including most of the FDA-approved drugs. Using various benchmarking methods, we show that the integration of GE data with the CS of the drugs can significantly improve the predictability of ADRs. Moreover, transforming GE features to enrichment vectors of biological terms further improves the predictive capability of the classifiers. The most predictive biological-term features can assist in understanding the drug mechanisms of action. Finally, we applied the classifier to all 〉20 000 small-molecules profiled, and developed a web portal for browsing and searching predictive small-molecule/ADR connections. Availability and Implementation: The interface for the adverse event predictions for the 〉20 000 LINCS compounds is available at http://maayanlab.net/SEP-L1000/ . Contact: avi.maayan@mssm.edu Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2016-07-30
    Description: Motivation: Environmental dissemination of antibiotic resistance genes (ARGs) has become an increasing concern for public health. Metagenomics approaches can effectively detect broad profiles of ARGs in environmental samples; however, the detection and subsequent classification of ARG-like sequences are time consuming and have been severe obstacles in employing metagenomic methods. We sought to accelerate quantification of ARGs in metagenomic data from environmental samples. Results: A Structured ARG reference database (SARG) was constructed by integrating ARDB and CARD, the two most commonly used databases. SARG was curated to remove redundant sequences and optimized to facilitate query sequence identification by similarity. A database with a hierarchical structure (type-subtype-reference sequence) was then constructed to facilitate classification (assigning ARG-like sequence to type, subtype and reference sequence) of sequences identified through similarity search. Utilizing SARG and a previously proposed hybrid functional gene annotation pipeline, we developed an online pipeline called ARGs-OAP for fast annotation and classification of ARG-like sequences from metagenomic data. We also evaluated and proposed a set of criteria important for efficiently conducting metagenomic analysis of ARGs using ARGs-OAP. Availability and Implementation: Perl script for ARGs-OAP can be downloaded from https://github.com/biofuture/Ublastx_stageone . ARGs-OAP can be accessed through http://smile.hku.hk/SARGs . Contact: zhangt@hku.hk or tiedjej@msu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2016-07-30
    Description: : Visualizing genomic data in chromosomal context can help detecting errors in data processing and may suggest new hypotheses to be tested. Here, we report a new tool for displaying large and diverse genomic data along chromosomes. The software is implemented in R so that visualization can be easily integrated with its numerous packages for processing genomic data. It supports simultaneous visualization of multiple tracks of data. Large genomic regions such as QTLs or synteny tracts may be shown along histograms of number of genes, genetic variants, or any other type of genomic element. Tracks can also contain values for continuous or categorical variables and the user can choose among points, connected lines, colored segments, or histograms for representing data. chromPlot takes data from tables in data.frame in GRanges formats. The information necessary to draw chromosomes for mouse and human is included with the package. For other organisms, chromPlot can read Gap and cytoBandIdeo tables from the UCSC Genome Browser. We present common use cases here, and a full tutorial is included as the package’s vignette. Availability and Implementation: chromPlot is distributed under a GLP2 licence at http://www.bioconductor.org . Contact: raverdugo@u.uchile.cl Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2016-07-30
    Description: : The most important features of error correction tools for sequencing data are accuracy, memory efficiency and fast runtime. The previous version of BLESS was highly memory-efficient and accurate, but it was too slow to handle reads from large genomes. We have developed a new version of BLESS to improve runtime and accuracy while maintaining a small memory usage. The new version, called BLESS 2, has an error correction algorithm that is more accurate than BLESS, and the algorithm has been parallelized using hybrid MPI and OpenMP programming. BLESS 2 was compared with five top-performing tools, and it was found to be the fastest when it was executed on two computing nodes using MPI, with each node containing twelve cores. Also, BLESS 2 showed at least 11% higher gain while retaining the memory efficiency of the previous version for large genomes. Availability and implementation: Freely available at https://sourceforge.net/projects/bless-ec Contact: dchen@illinois.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2016-07-30
    Description: Motivation: Accurate segmentation of brain electron microscopy (EM) images is a critical step in dense circuit reconstruction. Although deep neural networks (DNNs) have been widely used in a number of applications in computer vision, most of these models that proved to be effective on image classification tasks cannot be applied directly to EM image segmentation, due to the different objectives of these tasks. As a result, it is desirable to develop an optimized architecture that uses the full power of DNNs and tailored specifically for EM image segmentation. Results: In this work, we proposed a novel design of DNNs for this task. We trained a pixel classifier that operates on raw pixel intensities with no preprocessing to generate probability values for each pixel being a membrane or not. Although the use of neural networks in image segmentation is not completely new, we developed novel insights and model architectures that allow us to achieve superior performance on EM image segmentation tasks. Our submission based on these insights to the 2D EM Image Segmentation Challenge achieved the best performance consistently across all the three evaluation metrics. This challenge is still ongoing and the results in this paper are as of June 5, 2015. Availability and Implementation : https://github.com/ahmed-fakhry/dive Contact : sji@eecs.wsu.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2016-07-30
    Description: : Hilbert curves enable high-resolution visualization of genomic data on a chromosome- or genome-wide scale. Here we present the HilbertCurve package that provides an easy-to-use interface for mapping genomic data to Hilbert curves. The package transforms the curve as a virtual axis, thereby hiding the details of the curve construction from the user. HilbertCurve supports multiple-layer overlay that makes it a powerful tool to correlate the spatial distribution of multiple feature types. Availability and implementation: The HilbertCurve package and documentation are freely available from the Bioconductor project: http://www.bioconductor.org/packages/devel/bioc/html/HilbertCurve.html Contact: m.schlesner@dkfz.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2016-07-30
    Description: Motivation: The sequences among subgenomes in a polyploid species have high similarity, making it difficult to design genome-specific primers for sequence analysis. Results: We present GSP, a web-based platform to design genome-specific primers that distinguish subgenome sequences in a polyploid genome. GSP uses BLAST to extract homeologous sequences of the subgenomes in existing databases, performs a multiple sequence alignment, and design primers based on sequence variants in the alignment. An interactive primers diagram, a sequence alignment viewer and a virtual electrophoresis are displayed as parts of the primer design result. GSP also designs specific primers from multiple sequences uploaded by users. Availability and implementation: GSP is a user-friendly and efficient web platform freely accessible at http://probes.pw.usda.gov/GSP . Source code and command-line application are available at https://github.com/bioinfogenome/GSP . Contacts: yong.gu@ars.usda.gov or devin.coleman-derr@ars.usda.gov Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2016-07-30
    Description: : The prediction of protein–protein complexes from the structures of unbound components is a challenging and powerful strategy to decipher the mechanism of many essential biological processes. We present a user-friendly protein–protein docking server based on an improved version of FRODOCK that includes a complementary knowledge-based potential. The web interface provides a very effective tool to explore and select protein–protein models and interactively screen them against experimental distance constraints. The competitive success rates and efficiency achieved allow the retrieval of reliable potential protein–protein binding conformations that can be further refined with more computationally demanding strategies. Availability and Implementation: The server is free and open to all users with no login requirement at http://frodock.chaconlab.org Contact: pablo@chaconlab.org Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2016-07-30
    Description: : XIBD performs pairwise relatedness mapping on the X chromosome using dense single nucleotide polymorphism (SNP) data from either SNP chips or next generation sequencing data. It correctly accounts for the difference in chromosomal numbers between males and females and estimates global relatedness as well as regions of the genome that are identical by descent (IBD). XIBD also generates novel graphical summaries of all pairwise IBD tracts for a cohort making it very useful for disease locus mapping. Availability and implementation: XIBD is written in R/Rcpp and executed from shell scripts that are freely available from http://bioinf.wehi.edu.au/software/XIBD along with accompanying reference datasets. Contact: henden.l@wehi.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2016-07-30
    Description: : Several approaches to the region-based association analysis of quantitative traits have recently been developed and successively applied. However, no software package has been developed that implements all of these approaches for either independent or structured samples. Here we introduce FREGAT (Family REGional Association Tests), an R package that can handle family and population samples and implements a wide range of region-based association methods including burden tests, functional linear models, and kernel machine-based regression. FREGAT can be used in genome/exome-wide region-based association studies of quantitative traits and candidate gene analysis. FREGAT offers many useful options to empower its users and increase the effectiveness and applicability of region-based association analysis. Availability and Implementation: https://cran.r-project.org/web/packages/FREGAT/index.html Supplementary Information: Supplementary data are available at Bioinformatics Online. Contact: belon@bionet.nsc.ru
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2016-07-30
    Description: Motivation: There is a growing need in bioinformatics for easy-to-use software implementations of algorithms that are usable across platforms. At the same time, reproducibility of computational results is critical and often a challenge due to source code changes over time and dependencies. Results: The approach introduced in this paper addresses both of these needs with AlgoRun, a dedicated packaging system for implemented algorithms, using Docker technology. Implemented algorithms, packaged with AlgoRun, can be executed through a user-friendly interface directly from a web browser or via a standardized RESTful web API to allow easy integration into more complex workflows. The packaged algorithm includes the entire software execution environment, thereby eliminating the common problem of software dependencies and the irreproducibility of computations over time. AlgoRun-packaged algorithms can be published on http://algorun.org , a centralized searchable directory to find existing AlgoRun-packaged algorithms. Availability and implementation: AlgoRun is available at http://algorun.org and the source code under GPL license is available at https://github.com/algorun Contact: laubenbacher@uchc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2016-07-09
    Description: Motivation: Genome browsers that support fast navigation through vast datasets and provide interactive visual analytics functions can help scientists achieve deeper insight into biological systems. Toward this end, we developed Integrated Genome Browser (IGB), a highly configurable, interactive and fast open source desktop genome browser. Results: Here we describe multiple updates to IGB, including all-new capabilities to display and interact with data from high-throughput sequencing experiments. To demonstrate, we describe example visualizations and analyses of datasets from RNA-Seq, ChIP-Seq and bisulfite sequencing experiments. Understanding results from genome-scale experiments requires viewing the data in the context of reference genome annotations and other related datasets. To facilitate this, we enhanced IGB’s ability to consume data from diverse sources, including Galaxy, Distributed Annotation and IGB-specific Quickload servers. To support future visualization needs as new genome-scale assays enter wide use, we transformed the IGB codebase into a modular, extensible platform for developers to create and deploy all-new visualizations of genomic data. Availability and implementation: IGB is open source and is freely available from http://bioviz.org/igb . Contact: aloraine@uncc.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2016-07-09
    Description: Motivation: Single Molecule Real-Time (SMRT) sequencing technology and Oxford Nanopore technologies (ONT) produce reads over 10 kb in length, which have enabled high-quality genome assembly at an affordable cost. However, at present, long reads have an error rate as high as 10–15%. Complex and computationally intensive pipelines are required to assemble such reads. Results: We present a new mapper, minimap and a de novo assembler, miniasm, for efficiently mapping and assembling SMRT and ONT reads without an error correction stage. They can often assemble a sequencing run of bacterial data into a single contig in a few minutes, and assemble 45-fold Caenorhabditis elegans data in 9 min, orders of magnitude faster than the existing pipelines, though the consensus sequence error rate is as high as raw reads. We also introduce a pairwise read mapping format and a graphical fragment assembly format, and demonstrate the interoperability between ours and current tools. Availability and implementation: https://github.com/lh3/minimap and https://github.com/lh3/miniasm Contact: hengli@broadinstitute.org Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2016-07-09
    Description: Motivation: Alternative splicing represents a prime mechanism of post-transcriptional gene regulation whose misregulation is associated with a broad range of human diseases. Despite the vast availability of transcriptome data from different cell types and diseases, bioinformatics-based surveys of alternative splicing patterns remain a major challenge due to limited availability of analytical tools that combine high accuracy and rapidity. Results: We describe here a novel junction-centric method, jSplice, that enables de novo extraction of alternative splicing events from RNA-sequencing data with high accuracy, reliability and speed. Application to clear cell renal carcinoma (ccRCC) cell lines and 65 ccRCC patients revealed experimentally validatable alternative splicing changes and signatures able to prognosticate ccRCC outcome. In the aggregate, our results propose jSplice as a key analytic tool for the derivation of cell context-dependent alternative splicing patterns from large-scale RNA-sequencing datasets. Availability and implementation: jSplice is a standalone Python application freely available at http://www.mhs.biol.ethz.ch/research/krek/jsplice . Contact: wilhelm.krek@biol.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2016-07-09
    Description: Motivation: Single-cell RNA-sequencing technology allows detection of gene expression at the single-cell level. One typical feature of the data is a bimodality in the cellular distribution even for highly expressed genes, primarily caused by a proportion of non-expressing cells. The standard and the over-dispersed gamma-Poisson models that are commonly used in bulk-cell RNA-sequencing are not able to capture this property. Results: We introduce a beta-Poisson mixture model that can capture the bimodality of the single-cell gene expression distribution. We further integrate the model into the generalized linear model framework in order to perform differential expression analyses. The whole analytical procedure is called BPSC. The results from several real single-cell RNA-seq datasets indicate that ~90% of the transcripts are well characterized by the beta-Poisson model; the model-fit from BPSC is better than the fit of the standard gamma-Poisson model in 〉 80% of the transcripts. Moreover, in differential expression analyses of simulated and real datasets, BPSC performs well against edgeR, a conventional method widely used in bulk-cell RNA-sequencing data, and against scde and MAST, two recent methods specifically designed for single-cell RNA-seq data. Availability and Implementation: An R package BPSC for model fitting and differential expression analyses of single-cell RNA-seq data is available under GPL-3 license at https://github.com/nghiavtr/BPSC . Contact: yudi.pawitan@ki.se or mattias.rantalainen@ki.se Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2016-07-09
    Description: Motivation: Biological network querying is a problem requiring a considerable computational effort to be solved. Given a target and a query network, it aims to find occurrences of the query in the target by considering topological and node similarities (i.e. mismatches between nodes, edges, or node labels). Querying tools that deal with similarities are crucial in biological network analysis because they provide meaningful results also in case of noisy data. In addition, as the size of available networks increases steadily, existing algorithms and tools are becoming unsuitable. This is rising new challenges for the design of more efficient and accurate solutions. Results: This paper presents APPAGATO , a stochastic and parallel algorithm to find approximate occurrences of a query network in biological networks. APPAGATO handles node, edge and node label mismatches. Thanks to its randomic and parallel nature, it applies to large networks and, compared with existing tools, it provides higher performance as well as statistically significant more accurate results. Tests have been performed on protein–protein interaction networks annotated with synthetic and real gene ontology terms. Case studies have been done by querying protein complexes among different species and tissues. Availability and implementation: APPAGATO has been developed on top of CUDA-C ++ Toolkit 7.0 framework. The software is available online http://profs.sci.univr.it/~bombieri/APPAGATO . Contact: rosalba.giugno@univr.it Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2016-07-09
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2016-07-09
    Description: : We introduce SharpViSu, an interactive open-source software with a graphical user interface, which allows performing processing steps for localization data in an integrated manner. This includes common features and new tools such as correction of chromatic aberrations, drift correction based on iterative cross-correlation calculations, selection of localization events, reconstruction of 2D and 3D datasets in different representations, estimation of resolution by Fourier ring correlation, clustering analysis based on Voronoi diagrams and Ripley’s functions. SharpViSu is optimized to work with eventlist tables exported from most popular localization software. We show applications of these on single and double-labelled super-resolution data. Availability and implementation: SharpViSu is available as open source code and as compiled stand-alone application under https://github.com/andronovl/SharpViSu . Contact: klaholz@igbmc.fr Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2016-06-25
    Description: : A gene tree-species tree reconciliation explains the evolution of a gene tree within the species tree given a model of gene-family evolution. We describe ecceTERA, a program that implements a generic parsimony reconciliation algorithm, which accounts for gene duplication, loss and transfer (DTL) as well as speciation, involving sampled and unsampled lineages, within undated, fully dated or partially dated species trees. The ecceTERA reconciliation model and algorithm generalize or improve upon most published DTL parsimony algorithms for binary species trees and binary gene trees. Moreover, ecceTERA can estimate accurate species-tree aware gene trees using amalgamation. Availability and implementation : ecceTERA is freely available under http://mbb.univ-montp2.fr/MBB/download_sources/16__ecceTERA and can be run online at http://mbb.univ-montp2.fr/MBB/subsection/softExec.php?soft=eccetera . Contact: celine.scornavacca@umontpellier.fr Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2016-06-25
    Description: : The popularity of using NMR spectroscopy in metabolomics and natural products has driven the development of an array of NMR spectral analysis tools and databases. Particularly, web applications are well used recently because they are platform-independent and easy to extend through reusable web components. Currently available web applications provide the analysis of NMR spectra. However, they still lack the necessary processing and interactive visualization functionalities. To overcome these limitations, we present NMRPro, a web component that can be easily incorporated into current web applications, enabling easy-to-use online interactive processing and visualization. NMRPro integrates server-side processing with client-side interactive visualization through three parts: a python package to efficiently process large NMR datasets on the server-side, a Django App managing server-client interaction, and SpecdrawJS for client-side interactive visualization. Availability and implementation: Demo and installation instructions are available at http://mamitsukalab.org/tools/nmrpro/ Contact: mohamed@kuicr.kyoto-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2016-07-30
    Description: Motivation: DNA methylation is an important epigenetic modification related to a variety of diseases including cancers. We focus on the methylation data from Illumina’s Infinium HumanMethylation450 BeadChip. One of the key issues of methylation analysis is to detect the differential methylation sites between case and control groups. Previous approaches describe data with simple summary statistics or kernel function, and then use statistical tests to determine the difference. However, a summary statistics-based approach cannot capture complicated underlying structure, and a kernel function-based approach lacks interpretability of results. Results: We propose a novel method D 3 M, for detection of differential distribution of methylation, based on distribution-valued data. Our method can detect the differences in high-order moments, such as shapes of underlying distributions in methylation profiles, based on the Wasserstein metric. We test the significance of the difference between case and control groups and provide an interpretable summary of the results. The simulation results show that the proposed method achieves promising accuracy and shows favorable results compared with previous methods. Glioblastoma multiforme and lower grade glioma data from The Cancer Genome Atlas show that our method supports recent biological advances and suggests new insights. Availability and Implementation: R implemented code is freely available from https://github.com/ymatts/D3M/ . Contact: ymatsui@med.nagoya-u.ac.jp or shimamura@med.nagoya-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2016-07-30
    Description: Motivation: Similarity-based methods have been widely used in order to infer the properties of genes and gene products containing little or no experimental annotation. New approaches that overcome the limitations of methods that rely solely upon sequence similarity are attracting increased attention. One of these novel approaches is to use the organization of the structural domains in proteins. Results: We propose a method for the automatic annotation of protein sequences in the UniProt Knowledgebase (UniProtKB) by comparing their domain architectures, classifying proteins based on the similarities and propagating functional annotation. The performance of this method was measured through a cross-validation analysis using the Gene Ontology (GO) annotation of a sub-set of UniProtKB/Swiss-Prot. The results demonstrate the effectiveness of this approach in detecting functional similarity with an average F-score: 0.85. We applied the method on nearly 55.3 million uncharacterized proteins in UniProtKB/TrEMBL resulted in 44 818 178 GO term predictions for 12 172 114 proteins. 22% of these predictions were for 2 812 016 previously non-annotated protein entries indicating the significance of the value added by this approach. Availability and implementation: The results of the method are available at: ftp://ftp.ebi.ac.uk/pub/contrib/martin/DAAC/ . Contact: tdogan@ebi.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2016-07-30
    Description: Motivation: Species identification and quantification are common tasks in metagenomics and pathogen detection studies. The most recent techniques are built on mapping the sequenced reads against a reference database (e.g. whole genomes, marker genes, proteins) followed by application-dependent analysis steps. Although these methods have been proven to be useful in many scenarios, there is still room for improvement in species and strain level detection, mainly for low abundant organisms. Results: We propose a new method: DUDes, a reference-based taxonomic profiler that introduces a novel top-down approach to analyze metagenomic Next-generation sequencing (NGS) samples. Rather than predicting an organism presence in the sample based only on relative abundances, DUDes first identifies possible candidates by comparing the strength of the read mapping in each node of the taxonomic tree in an iterative manner. Instead of using the lowest common ancestor we propose a new approach: the deepest uncommon descendent. We showed in experiments that DUDes works for single and multiple organisms and can identify low abundant taxonomic groups with high precision. Availability and Implementation: DUDes is open source and it is available at http://sf.net/p/dudes Supplementary information: Supplementary data are available at Bioinformatics online. Contact: renardB@rki.de
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2016-07-30
    Description: Motivation: Moonlighting proteins (MPs) show multiple cellular functions within a single polypeptide chain. To understand the overall landscape of their functional diversity, it is important to establish a computational method that can identify MPs on a genome scale. Previously, we have systematically characterized MPs using functional and omics-scale information. In this work, we develop a computational prediction model for automatic identification of MPs using a diverse range of protein association information. Results: We incorporated a diverse range of protein association information to extract characteristic features of MPs, which range from gene ontology (GO), protein–protein interactions, gene expression, phylogenetic profiles, genetic interactions and network-based graph properties to protein structural properties, i.e. intrinsically disordered regions in the protein chain. Then, we used machine learning classifiers using the broad feature space for predicting MPs. Because many known MPs lack some proteomic features, we developed an imputation technique to fill such missing features. Results on the control dataset show that MPs can be predicted with over 98% accuracy when GO terms are available. Furthermore, using only the omics-based features the method can still identify MPs with over 75% accuracy. Last, we applied the method on three genomes: Saccharomyces cerevisiae , Caenorhabditis elegans and Homo sapiens , and found that about 2–10% of proteins in the genomes are potential MPs. Availability and Implementation: Code available at http://kiharalab.org/MPprediction Contact: dkihara@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2016-07-30
    Description: Motivation: Design of protein–protein interaction (PPI) inhibitors is a major challenge in Structural Bioinformatics. Peptides, especially short ones (5–15 amino acid long), are natural candidates for inhibition of protein–protein complexes due to several attractive features such as high structural compatibility with the protein binding site (mimicking the surface of one of the proteins), small size and the ability to form strong hotspot binding connections with the protein surface. Efficient rational peptide design is still a major challenge in computer aided drug design, due to the huge space of possible sequences, which is exponential in the length of the peptide, and the high flexibility of peptide conformations. Results: In this article we present PinaColada, a novel computational method for the design of peptide inhibitors for protein–protein interactions. We employ a version of the ant colony optimization heuristic, which is used to explore the exponential space ( 20n ) of length n peptide sequences, in combination with our fast robotics motivated PepCrawler algorithm, which explores the conformational space for each candidate sequence. PinaColada is being run in parallel, on a DELL PowerEdge 2.8 GHZ computer with 20 cores and 256 GB memory, and takes up to 24 h to design a peptide of 5-15 amino acids length. Availability and implementation: An online server available at: http://bioinfo3d.cs.tau.ac.il/PinaColada/. Contact: danielza@post.tau.ac.il ; wolfson@tau.ac.il
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2016-07-30
    Description: Motivation: Whole-genome low-coverage sequencing has been combined with linkage-disequilibrium (LD)-based genotype refinement to accurately and cost-effectively infer genotypes in large cohorts of individuals. Most genotype refinement methods are based on hidden Markov models, which are accurate but computationally expensive. We introduce an algorithm that models LD using a simple multivariate Gaussian distribution. The key feature of our algorithm is its speed. Results: Our method is hundreds of times faster than other methods on the same data set and its scaling behaviour is linear in the number of samples. We demonstrate the performance of the method on both low- and high-coverage samples. Availability and implementation: The source code is available at https://github.com/illumina/marvin Contact: rarthur@illumina.com Supplementary information: Supplementary data are available at Bioinformatics online .
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2016-07-30
    Description: Motivation : T-cell epitopes serve as molecular keys to initiate adaptive immune responses. Identification of T-cell epitopes is also a key step in rational vaccine design. Most available methods are driven by informatics and are critically dependent on experimentally obtained training data. Analysis of a training set from Immune Epitope Database (IEDB) for several alleles indicates that the sampling of the peptide space is extremely sparse covering a tiny fraction of the possible nonamer space, and also heavily skewed, thus restricting the range of epitope prediction. Results : We present a new epitope prediction method that has four distinct computational modules: (i) structural modelling, estimating statistical pair-potentials and constraint derivation, (ii) implicit modelling and interaction profiling, (iii) feature representation and binding affinity prediction and (iv) use of graphical models to extract peptide sequence signatures to predict epitopes for HLA class I alleles. Conclusions : HLaffy is a novel and efficient epitope prediction method that predicts epitopes for any Class-1 HLA allele, by estimating the binding strengths of peptide-HLA complexes which is achieved through learning pair-potentials important for peptide binding. It relies on the strength of the mechanistic understanding of peptide-HLA recognition and provides an estimate of the total ligand space for each allele. The performance of HLaffy is seen to be superior to the currently available methods. Availability and implementation : The method is made accessible through a webserver http://proline.biochem.iisc.ernet.in/HLaffy . Contact : nchandra@biochem.iisc.ernet.in Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2016-07-30
    Description: Motivation: In vitro and in vivo cell proliferation is often studied using the dye carboxyfluorescein succinimidyl ester (CFSE). The CFSE time-series data provide information about the proliferation history of populations of cells. While the experimental procedures are well established and widely used, the analysis of CFSE time-series data is still challenging. Many available analysis tools do not account for cell age and employ optimization methods that are inefficient (or even unreliable). Results: We present a new model-based analysis method for CFSE time-series data. This method uses a flexible description of proliferating cell populations, namely, a division-, age- and label-structured population model. Efficient maximum likelihood and Bayesian estimation algorithms are introduced to infer the model parameters and their uncertainties. These methods exploit the forward sensitivity equations of the underlying partial differential equation model for efficient and accurate gradient calculation, thereby improving computational efficiency and reliability compared with alternative approaches and accelerating uncertainty analysis. The performance of the method is assessed by studying a dataset for immune cell proliferation. This revealed the importance of different factors on the proliferation rates of individual cells. Among others, the predominate effect of cell age on the division rate is found, which was not revealed by available computational methods. Availability and implementation: The MATLAB source code implementing the models and algorithms is available from http://janhasenauer.github.io/ShAPE-DALSP/ . Contact: jan.hasenauer@helmholtz-muenchen.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2016-07-30
    Description: Motivation: The challenges of successfully applying causal inference methods include: (i) satisfying underlying assumptions, (ii) limitations in data/models accommodated by the software and (iii) low power of common multiple testing approaches. Results: The causal inference test (CIT) is based on hypothesis testing rather than estimation, allowing the testable assumptions to be evaluated in the determination of statistical significance. A user-friendly software package provides P -values and optionally permutation-based FDR estimates ( q -values) for potential mediators. It can handle single and multiple binary and continuous instrumental variables, binary or continuous outcome variables and adjustment covariates. Also, the permutation-based FDR option provides a non-parametric implementation. Conclusion: Simulation studies demonstrate the validity of the cit package and show a substantial advantage of permutation-based FDR over other common multiple testing strategies. Availability and implementation: The cit open-source R package is freely available from the CRAN website ( https://cran.r-project.org/web/packages/cit/index.html ) with embedded C ++ code that utilizes the GNU Scientific Library, also freely available ( http://www.gnu.org/software/gsl/ ). Contact: joshua.millstein@usc.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2016-07-30
    Description: Motivation: The vast majority of the many thousands of disease-associated single nucleotide polymorphisms (SNPs) lie in the non-coding part of the genome. They are likely to affect regulatory elements, such as enhancers and promoters, rather than the function of a protein. To understand the molecular mechanisms underlying genetic diseases, it is therefore increasingly important to study the effect of a SNP on nearby molecular traits such as chromatin or transcription factor binding. Results: We developed SNPhood , a user-friendly Bioconductor R package to investigate, quantify and visualise the local epigenetic neighbourhood of a set of SNPs in terms of chromatin marks or TF binding sites using data from NGS experiments. Availability and implementation: SNPhood is publicly available and maintained as an R Bioconductor package at http://bioconductor.org/packages/SNPhood/ . Contact: judith.zaugg@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2016-07-30
    Description: Motivation: Versatile and efficient variant calling tools are needed to analyze large scale sequencing datasets. In particular, identification of copy number changes remains a challenging task due to their complexity, susceptibility to sequencing biases, variation in coverage data and dependence on genome-wide sample properties, such as tumor polyploidy or polyclonality in cancer samples. Results: We have developed a new tool, Canvas, for identification of copy number changes from diverse sequencing experiments including whole-genome matched tumor-normal and single-sample normal re-sequencing, as well as whole-exome matched and unmatched tumor-normal studies. In addition to variant calling, Canvas infers genome-wide parameters such as cancer ploidy, purity and heterogeneity. It provides fast and easy-to-run workflows that can scale to thousands of samples and can be easily incorporated into variant calling pipelines. Availability and Implementation: Canvas is distributed under an open source license and can be downloaded from https://github.com/Illumina/canvas . Contact: eroller@illumina.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2016-07-30
    Description: p ileup.js is a new browser-based genome viewer. It is designed to facilitate the investigation of evidence for genomic variants within larger web applications. It takes advantage of recent developments in the JavaScript ecosystem to provide a modular, reliable and easily embedded library. Availability and implementation: The code and documentation for pileup.js is publicly available at https://github.com/hammerlab/pileup.js under the Apache 2.0 license. Contact : correspondence@hammerlab.org
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2016-07-30
    Description: Motivation: We present an update to the pathway enrichment analysis tool ‘Pathway Analysis by Randomization Incorporating Structure (PARIS)’ that determines aggregated association signals generated from genome-wide association study results. Pathway-based analyses highlight biological pathways associated with phenotypes. PARIS uses a unique permutation strategy to evaluate the genomic structure of interrogated pathways, through permutation testing of genomic features, thus eliminating many of the over-testing concerns arising with other pathway analysis approaches. Results: We have updated PARIS to incorporate expanded pathway definitions through the incorporation of new expert knowledge from multiple database sources, through customized user provided pathways, and other improvements in user flexibility and functionality. Availability and implementation: PARIS is freely available to all users at https://ritchielab.psu.edu/software/paris-download . Contact: jnc43@case.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2016-07-30
    Description: : Nucleotide Similarity Scanner (NSimScan) is specialized for searching massive DNA databases for distant similarities. Its targeted applications include phylogenomics, comparative and functional studies of non-coding sequences, contamination detection, etc. NSimScan outperforms industry standard tools in combined sensitivity, accuracy and speed, operating at sensitivity similar to BLAST, accuracy of ssearch and speed of MegaBLAST. Availability and implementation: NSimScan is available at https://github.com/abadona/qsimscan as a part of QSimScan package. It is implemented in C ++, distributed under MIT license and supported on Linux, OS X and Windows (with cygwin). Contact: dkaznadzey@yahoo.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2016-07-30
    Description: : We present TreeDom, a web tool for graphically analysing the evolutionary history of domains in multi-domain proteins. Individual domains on the same protein chain may have distinct evolutionary histories, which is important to grasp in order to understand protein function. For instance, it may be important to know whether a domain was duplicated recently or long ago, to know the origin of inserted domains, or to know the pattern of domain loss within a protein family. TreeDom uses the Pfam database as the source of domain annotations, and displays these on a sequence tree. An advantage of TreeDom is that the user can limit the analysis to N sequences that are most similar to a query, or provide a list of sequence IDs to include. Using the Pfam alignment of the selected sequences, a tree is built and displayed together with the domain architecture of each sequence. Availablility and implementation: http://TreeDom.sbc.su.se Contact: Erik.Sonnhammer@scilifelab.se
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2016-07-30
    Description: : The NCI-60 human tumor cell line panel is an invaluable resource for cancer researchers, providing drug sensitivity, molecular and phenotypic data for a range of cancer types. CellMiner is a web resource that provides tools for the acquisition and analysis of quality-controlled NCI-60 data. CellMiner supports queries of up to 150 drugs or genes, but the output is an Excel file for each drug or gene. This output format makes it difficult for researchers to explore the data from large queries. CellMiner Companion is a web application that facilitates the exploration and visualization of output from CellMiner, further increasing the accessibility of NCI-60 data. Availability and Implementation: The web application is freely accessible at https://pul-bioinformatics.shinyapps.io/CellMinerCompanion . The R source code can be downloaded at https://github.com/pepascuzzi/CellMinerCompanion.git . Contact: ppascuzz@purdue.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2016-07-30
    Description: : SMeagol is a software tool to simulate highly realistic microscopy data based on spatial systems biology models, in order to facilitate development, validation and optimization of advanced analysis methods for live cell single molecule microscopy data. Availability and implementation: SMeagol runs on Matlab R2014 and later, and uses compiled binaries in C for reaction–diffusion simulations. Documentation, source code and binaries for Mac OS, Windows and Ubuntu Linux can be downloaded from http://smeagol.sourceforge.net . Contact: johan.elf@icm.uu.se Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2016-07-30
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2016-03-26
    Description: Motivation : Photoactivatable ribonucleoside-enhanced cross-linking and immunoprecipitation (PAR-CLIP) is an experimental method based on next-generation sequencing for identifying the RNA interaction sites of a given protein. The method deliberately inserts T-to-C substitutions at the RNA-protein interaction sites, which provides a second layer of evidence compared with other CLIP methods. However, the experiment includes several sources of noise which cause both low-frequency errors and spurious high-frequency alterations. Therefore, rigorous statistical analysis is required in order to separate true T-to-C base changes, following cross-linking, from noise. So far, most of the existing PAR-CLIP data analysis methods focus on discarding the low-frequency errors and rely on high-frequency substitutions to report binding sites, not taking into account the possibility of high-frequency false positive substitutions. Results : Here, we introduce BMix , a new probabilistic method which explicitly accounts for the sources of noise in PAR-CLIP data and distinguishes cross-link induced T-to-C substitutions from low and high-frequency erroneous alterations. We demonstrate the superior speed and accuracy of our method compared with existing approaches on both simulated and real, publicly available human datasets. Availability and implementation : The model is freely accessible within the BMix toolbox at www.cbg.bsse.ethz.ch/software/BMix , available for Matlab and R. Supplementary information: Supplementary data is available at Bioinformatics online. Contact : niko.beerenwinkel@bsse.ethz.ch
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2016-03-26
    Description: Motivation: Gene networks have become a central tool in the analysis of genomic data but are widely regarded as hard to interpret. This has motivated a great deal of comparative evaluation and research into best practices. We explore the possibility that this may lead to overfitting in the field as a whole. Results: We construct a model of ‘research communities’ sampling from real gene network data and machine learning methods to characterize performance trends. Our analysis reveals an important principle limiting the value of replication, namely that targeting it directly causes ‘easy’ or uninformative replication to dominate analyses. We find that when sampling across network data and algorithms with similar variability, the relationship between replicability and accuracy is positive (Spearman’s correlation, r s ~0.33) but where no such constraint is imposed, the relationship becomes negative for a given gene function ( r s ~ –0.13). We predict factors driving replicability in some prior analyses of gene networks and show that they are unconnected with the correctness of the original result, instead reflecting replicable biases. Without these biases, the original results also vanish replicably. We show these effects can occur quite far upstream in network data and that there is a strong tendency within protein–protein interaction data for highly replicable interactions to be associated with poor quality control. Availability and implementation: Algorithms, network data and a guide to the code available at: https://github.com/wimverleyen/AggregateGeneFunctionPrediction . Contact: jgillis@cshl.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2016-03-26
    Description: Motivation: Recent advances in single molecule real-time (SMRT) and nanopore sequencing technologies have enabled high-quality assemblies from long and inaccurate reads. However, these approaches require high coverage by long reads and remain expensive. On the other hand, the inexpensive short reads technologies produce accurate but fragmented assemblies. Thus, a hybrid approach that assembles long reads (with low coverage) and short reads has a potential to generate high-quality assemblies at reduced cost. Results: We describe hybrid SPA des algorithm for assembling short and long reads and benchmark it on a variety of bacterial assembly projects. Our results demonstrate that hybrid SPA des generates accurate assemblies (even in projects with relatively low coverage by long reads) thus reducing the overall cost of genome sequencing. We further present the first complete assembly of a genome from single cells using SMRT reads. Availability and implementation: hybrid SPA des is implemented in C++ as a part of SPAdes genome assembler and is publicly available at http://bioinf.spbau.ru/en/spades Contact: d.antipov@spbu.ru Supplementary information: supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2016-03-26
    Description: Motivation: There are numerous examples of RNA–RNA complexes, including microRNA–mRNA and small RNA–mRNA duplexes for regulation of translation, guide RNA interactions with target RNA for post-transcriptional modification and small nuclear RNA duplexes for splicing. Predicting the base pairs formed between two interacting sequences remains difficult, at least in part because of the competition between unimolecular and bimolecular structure. Results: Two algorithms were developed for improved prediction of bimolecular RNA structure that consider the competition between self-structure and bimolecular structure. These algorithms utilize two novel approaches to evaluate accessibility: free energy density minimization and pseudo-energy minimization. Free energy density minimization minimizes the folding free energy change per nucleotide involved in an intermolecular secondary structure. Pseudo-energy minimization (called AccessFold) minimizes the sum of free energy change and a pseudo-free energy penalty for bimolecular pairing of nucleotides that are unlikely to be accessible for bimolecular structure. The pseudo-free energy, derived from unimolecular pairing probabilities, is applied per nucleotide in bimolecular pairs, and this approach is able to predict binding sites that are split by unimolecular structures. A benchmark set of 17 bimolecular RNA structures was assembled to assess structure prediction. Pseudo-energy minimization provides a statistically significant improvement in sensitivity over the method that was found in a benchmark to be the most accurate previously available method, with an improvement from 36.8% to 57.8% in mean sensitivity for base pair prediction. Availability and implementation: Pseudo-energy minimization is available for download as AccessFold, under an open-source license and as part of the RNAstructure package, at: http://rna.urmc.rochester.edu/RNAstructure.html . Contact: david_mathews@urmc.rochester.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2016-03-26
    Description: Motivation: Simulating complex evolution scenarios of multiple populations is an important task for answering many basic questions relating to population genomics. Apart from the population samples, the underlying Ancestral Recombinations Graph (ARG) is an additional important means in hypothesis checking and reconstruction studies. Furthermore, complex simulations require a plethora of interdependent parameters making even the scenario-specification highly non-trivial. Results: We present an algorithm SimRA that simulates generic multiple population evolution model with admixture. It is based on random graphs that improve dramatically in time and space requirements of the classical algorithm of single populations. Using the underlying random graphs model, we also derive closed forms of expected values of the ARG characteristics i.e., height of the graph, number of recombinations, number of mutations and population diversity in terms of its defining parameters. This is crucial in aiding the user to specify meaningful parameters for the complex scenario simulations, not through trial-and-error based on raw compute power but intelligent parameter estimation. To the best of our knowledge this is the first time closed form expressions have been computed for the ARG properties. We show that the expected values closely match the empirical values through simulations. Finally, we demonstrate that SimRA produces the ARG in compact forms without compromising any accuracy. We demonstrate the compactness and accuracy through extensive experiments. Availability and implementation : SimRA ( Sim ulation based on R andom graph A lgorithms) source, executable, user manual and sample input-output sets are available for downloading at: https://github.com/ComputationalGenomics/SimRA Contact : parida@us.ibm.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2016-03-26
    Description: Motivation : The Optical Mapping System discovers structural variants and potentiates sequence assembly of genomes via scaffolding and comparisons that globally validate or correct sequence assemblies. Despite its utility, there are few publicly available tools for aligning optical mapping datasets. Results : Here we present software, named ‘Maligner’, for the alignment of both single molecule restriction maps (Rmaps) and in silico restriction maps of sequence contigs to a reference. Maligner provides two modes of alignment: an efficient, sensitive dynamic programming implementation that scales to large eukaryotic genomes, and a faster indexed based implementation for finding alignments with unmatched sites in the reference but not the query. We compare our software to other publicly available tools on Rmap datasets and show that Maligner finds more correct alignments in comparable runtime. Lastly, we introduce the M-Score statistic for normalizing alignment scores across restriction maps and demonstrate its utility for selecting high quality alignments. Availability and implementation : The Maligner software is written in C ++ and is available at https://github.com/LeeMendelowitz/maligner under the GNU General Public License. Contact : mpop@umiacs.umd.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2016-03-26
    Description: Motivation: The detection of genomic structural variation (SV) has advanced tremendously in recent years due to progress in high-throughput sequencing technologies. Novel sequence insertions, insertions without similarity to a human reference genome, have received less attention than other types of SVs due to the computational challenges in their detection from short read sequencing data, which inherently involves de novo assembly. De novo assembly is not only computationally challenging, but also requires high-quality data. Although the reads from a single individual may not always meet this requirement, using reads from multiple individuals can increase power to detect novel insertions. Results: We have developed the program PopIns , which can discover and characterize non-reference insertions of 100 bp or longer on a population scale. In this article, we describe the approach we implemented in PopIns. It takes as input a reads-to-reference alignment, assembles unaligned reads using a standard assembly tool, merges the contigs of different individuals into high-confidence sequences, anchors the merged sequences into the reference genome, and finally genotypes all individuals for the discovered insertions. Our tests on simulated data indicate that the merging step greatly improves the quality and reliability of predicted insertions and that PopIns shows significantly better recall and precision than the recent tool MindTheGap. Preliminary results on a dataset of 305 Icelanders demonstrate the practicality of the new approach. Availability and implementation: The source code of PopIns is available from http://github.com/bkehr/popins . Contact: birte.kehr@decode.is Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2016-03-26
    Description: Motivation: High-throughput sequencing technologies provide access to an increasing number of bacterial genomes. Today, many analyses involve the comparison of biological properties among many strains of a given species, or among species of a particular genus. Tools that can help the microbiologist with these tasks become increasingly important. Results: Insyght is a comparative visualization tool whose core features combine a synchronized navigation across genomic data of multiple organisms with a versatile interoperability between complementary views. In this work, we have greatly increased the scope of the Insyght public dataset by including 2688 complete bacterial genomes available in Ensembl thus vastly improving its phylogenetic coverage. We also report the development of a virtual machine that allows users to easily set up and customize their own local Insyght server. Availability and implementation: http://genome.jouy.inra.fr/Insyght Contact: Thomas.Lacroix@jouy.inra.fr
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2016-03-26
    Description: : Breast cancer is one of the most frequent cancers among women. Extensive studies into the molecular heterogeneity of breast cancer have produced a plethora of molecular subtype classification and prognosis prediction algorithms, as well as numerous gene expression signatures. However, reimplementation of these algorithms is a tedious but important task to enable comparison of existing signatures and classification models between each other and with new models. Here, we present the genefu R/Bioconductor package, a multi-tiered compendium of bioinformatics algorithms and gene signatures for molecular subtyping and prognostication in breast cancer. Availability and implementation: The genefu package is available from Bioconductor. http://www.bioconductor.org/packages/devel/bioc/html/genefu.html . Source code is also available on Github https://github.com/bhklab/genefu . Contact: bhaibeka@uhnresearch.ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2016-04-08
    Description: : Precise regulatory control of genes, particularly in eukaryotes, frequently requires the joint action of multiple sequence-specific transcription factors. A cis -regulatory module (CRM) is a genomic locus that is responsible for gene regulation and that contains multiple transcription factor binding sites in close proximity. Given a collection of known transcription factor binding motifs, many bioinformatics methods have been proposed over the past 15 years for identifying within a genomic sequence candidate CRMs consisting of clusters of those motifs. Results: The MCAST algorithm uses a hidden Markov model with a P -value-based scoring scheme to identify candidate CRMs. Here, we introduce a new version of MCAST that offers improved graphical output, a dynamic background model, statistical confidence estimates based on false discovery rate estimation and, most significantly, the ability to predict CRMs while taking into account epigenomic data such as DNase I sensitivity or histone modification data. We demonstrate the validity of MCAST’s statistical confidence estimates and the utility of epigenomic priors in identifying CRMs. Availability and implementation: MCAST is part of the MEME Suite software toolkit. A web server and source code are available at http://meme-suite.org and http://alternate.meme-suite.org . Contact: t.bailey@imb.uq.edu.au or william-noble@uw.edu Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2016-04-08
    Description: : Pharmacogenomics holds great promise for the development of biomarkers of drug response and the design of new therapeutic options, which are key challenges in precision medicine. However, such data are scattered and lack standards for efficient access and analysis, consequently preventing the realization of the full potential of pharmacogenomics. To address these issues, we implemented PharmacoGx , an easy-to-use, open source package for integrative analysis of multiple pharmacogenomic datasets. We demonstrate the utility of our package in comparing large drug sensitivity datasets, such as the Genomics of Drug Sensitivity in Cancer and the Cancer Cell Line Encyclopedia. Moreover, we show how to use our package to easily perform Connectivity Map analysis. With increasing availability of drug-related data, our package will open new avenues of research for meta-analysis of pharmacogenomic data. Availability and implementation : PharmacoGx is implemented in R and can be easily installed on any system. The package is available from CRAN and its source code is available from GitHub. Contact : bhaibeka@uhnresearch.ca or benjamin.haibe.kains@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2016-04-08
    Description: : In population genetics and phylogeography, haplotype genealogy graphs are important tools for the visualization of population structure based on sequence data. In this type of graph, node sizes are often drawn in proportion to haplotype frequencies and edge lengths represent the minimum number of mutations separating adjacent nodes. I here present Fitchi, a new program that produces publication-ready haplotype genealogy graphs based on the Fitch algorithm. Availability and implementation: http://www.evoinformatics.eu/fitchi.htm Contact : michaelmatschiner@mac.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2016-04-08
    Description: : Genome-wide association studies (GWASs) have successfully identified many sequence variants that are significantly associated with common diseases and traits. Tens of thousands of such trait-associated SNPs have already been cataloged, which we believe form a great resource for genomic research. Recent studies have demonstrated that the collection of trait-associated SNPs can be exploited to indicate whether a given genomic interval or intervals are likely to be functionally connected with certain phenotypes or diseases. Despite this importance, currently, there is no ready-to-use computational tool able to connect genomic intervals to phenotypes. Here, we present traseR , an easy-to-use R Bioconductor package that performs enrichment analyses of trait-associated SNPs in arbitrary genomic intervals with flexible options, including testing method, type of background and inclusion of SNPs in LD. Availability and implementation: The traseR R package preloaded with up-to-date collection of trait-associated SNPs are freely available in Bioconductor Contact: zhaohui.qin@emory.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2016-04-08
    Description: : Data-dependent acquisition (DDA) is the most common method used to control the acquisition process of shotgun proteomics experiments. While novel DDA approaches have been proposed, their evaluation is made difficult by the need of programmatic control of a mass spectrometer. An alternative is in silico analysis, for which suitable software has been unavailable. To meet this need, we have developed MSAcquisitionSimulator—a collection of C ++ programs for simulating ground truth LC-MS data and the subsequent application of custom DDA algorithms. It provides an opportunity for researchers to test, refine and evaluate novel DDA algorithms prior to implementation on a mass spectrometer. Availability and implementation: The software is freely available from its Github repository http://www.github.com/DennisGoldfarb/MSAcquisitionSimulator/ which contains further documentation and usage instructions. Contact: weiwang@cs.ucla.edu or ben_major@med.unc.edu Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2016-04-08
    Description: : Coarse grain (CG) models allow long-scale simulations with a much lower computational cost than that of all-atom simulations. However, the absence of atomistic detail impedes the analysis of specific atomic interactions that are determinant in most interesting biomolecular processes. In order to study these phenomena, it is necessary to reconstruct the atomistic structure from the CG representation. This structure can be analyzed by itself or be used as an onset for atomistic molecular dynamics simulations. In this work, we present a computer program that accurately reconstructs the atomistic structure from a CG model for proteins, using a simple geometrical algorithm. Availability and implementation: The software is free and available online at http://www.ic.fcen.uba.ar/cg2aa/cg2aa.py Supplementary information: Supplementary data are available at Bioinformatics online. Contact: lula@qi.fcen.uba.ar
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2016-04-08
    Description: : The Sun Grid Engine (SGE) high-performance computing batch queueing system is commonly used in bioinformatics analysis. Creating re-usable scripts for the SGE is a common challenge. The qsubsec template language and interpreter described here allow researchers to easily create generic template definitions that encapsulate a particular computational job, effectively separating the process logic from the specific run details. At submission time, the generic template is filled in with specific values. This system provides an intermediate level between simple scripting and complete workflow management tools. Availability and implementation: Qsubsec is open-source and is available at https://github.com/alastair-droop/qsubsec . Contact: a.p.droop@leeds.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2016-04-08
    Description: : The lack of visualization frameworks to guide interpretation and facilitate discovery is a potential bottleneck for precision medicine, systems genetics and other studies. To address this we have developed an interactive, reproducible, web-based prioritization approach that builds on our earlier work. HitWalker2 is highly flexible and can utilize many data types and prioritization methods based upon available data and desired questions, allowing it to be utilized in a diverse range of studies such as cancer, infectious disease and psychiatric disorders. Availability and implementation: Source code is freely available at https://github.com/biodev/HitWalker2 and implemented using Python/Django, Neo4j and Javascript (D3.js and jQuery). We support major open source browsers (e.g. Firefox and Chromium/Chrome). Contact: wilmotb@ohsu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Additional information/instructions are available at https://github.com/biodev/HitWalker2/wiki
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2016-04-08
    Description: Motivation: Whole genome sequencing (WGS) of parent-offspring trios is a powerful approach for identifying disease-associated genes via detecting copy number variations (CNVs). Existing approaches, which detect CNVs for each individual in a trio independently, usually yield low-detection accuracy. Joint modeling approaches leveraging Mendelian transmission within the parent-offspring trio can be an efficient strategy to improve CNV detection accuracy. Results: In this study, we developed TrioCNV, a novel approach for jointly detecting CNVs in parent-offspring trios from WGS data. Using negative binomial regression, we modeled the read depth signal while considering both GC content bias and mappability bias. Moreover, we incorporated the family relationship and used a hidden Markov model to jointly infer CNVs for three samples of a parent-offspring trio. Through application to both simulated data and a trio from 1000 Genomes Project, we showed that TrioCNV achieved superior performance than existing approaches. Availability and implementation: The software TrioCNV implemented using a combination of Java and R is freely available from the website at https://github.com/yongzhuang/TrioCNV . Contact: ydwang@hit.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2016-04-08
    Description: Motivation: Recent advancements in molecular methods have made it possible to capture physical contacts between multiple chromatin fragments. The resulting association matrices provide a noisy estimate for average spatial proximity that can be used to gain insights into the genome organization inside the nucleus. However, extracting topological information from these data is challenging and their integration across resolutions is still poorly addressed. Recent findings suggest that a hierarchical approach could be advantageous for addressing these challenges. Results: We present an algorithmic framework, which is based on hierarchical block matrices (HBMs), for topological analysis and integration of chromosome conformation capture (3C) data. We first describe chromoHBM, an algorithm that compresses high-throughput 3C (HiT-3C) data into topological features that are efficiently summarized with an HBM representation. We suggest that instead of directly combining HiT-3C datasets across resolutions, which is a difficult task, we can integrate their HBM representations, and describe chromoHBM-3C, an algorithm which merges HBMs. Since three-dimensional (3D) reconstruction can also benefit from topological information, we further present chromoHBM-3D, an algorithm which exploits the HBM representation in order to gradually introduce topological constraints to the reconstruction process. We evaluate our approach in light of previous image microscopy findings and epigenetic data, and show that it can relate multiple spatial scales and provide a more complete view of the 3D genome architecture. Availability and implementation: The presented algorithms are available from: https://github.com/yolish/hbm . Contact: ys388@cam.ac.uk or pl219@cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2016-04-08
    Description: Motivation: There are various reasons for rerunning bioinformatics tools and pipelines on sequencing data, including reproducing a past result, validation of a new tool or workflow using a known dataset, or tracking the impact of database changes. For identical results to be achieved, regularly updated reference sequence databases must be versioned and archived. Database administrators have tried to fill the requirements by supplying users with one-off versions of databases, but these are time consuming to set up and are inconsistent across resources. Disk storage and data backup performance has also discouraged maintaining multiple versions of databases since databases such as NCBI nr can consume 50 Gb or more disk space per version, with growth rates that parallel Moore's law. Results: Our end-to-end solution combines our own Kipper software package—a simple key-value large file versioning system—with BioMAJ (software for downloading sequence databases), and Galaxy (a web-based bioinformatics data processing platform). Available versions of databases can be recalled and used by command-line and Galaxy users. The Kipper data store format makes publishing curated FASTA databases convenient since in most cases it can store a range of versions into a file marginally larger than the size of the latest version. Availability and implementation: Kipper v1.0.0 and the Galaxy Versioned Data tool are written in Python and released as free and open source software available at https://github.com/Public-Health-Bioinformatics/kipper and https://github.com/Public-Health-Bioinformatics/versioned_data , respectively; detailed setup instructions can be found at https://github.com/Public-Health-Bioinformatics/versioned_data/blob/master/doc/setup.md Contact : Damion.Dooley@Bccdc.Ca or William.Hsiao@Bccdc.Ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2016-01-10
    Description: Motivation: Rapid advances in genotyping and genome-wide association studies have enabled the discovery of many new genotype–phenotype associations at the resolution of individual markers. However, these associations explain only a small proportion of theoretically estimated heritability of most diseases. In this work, we propose an integrative mixture model called JBASE: joint Bayesian analysis of subphenotypes and epistasis. JBASE explores two major reasons of missing heritability: interactions between genetic variants, a phenomenon known as epistasis and phenotypic heterogeneity, addressed via subphenotyping. Results: Our extensive simulations in a wide range of scenarios repeatedly demonstrate that JBASE can identify true underlying subphenotypes, including their associated variants and their interactions, with high precision. In the presence of phenotypic heterogeneity, JBASE has higher Power and lower Type 1 Error than five state-of-the-art approaches. We applied our method to a sample of individuals from Mexico with Type 2 diabetes and discovered two novel epistatic modules, including two loci each, that define two subphenotypes characterized by differences in body mass index and waist-to-hip ratio. We successfully replicated these subphenotypes and epistatic modules in an independent dataset from Mexico genotyped with a different platform. Availability and implementation: JBASE is implemented in C++, supported on Linux and is available at http://www.cs.toronto.edu/~goldenberg/JBASE/jbase.tar.gz . The genotype data underlying this study are available upon approval by the ethics review board of the Medical Centre Siglo XXI. Please contact Dr Miguel Cruz at mcruzl@yahoo.com for assistance with the application. Contact: anna.goldenberg@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2016-01-10
    Description: Motivation: Protein phosphorylation is a post-translational modification that underlines various aspects of cellular signaling. A key step to reconstructing signaling networks involves identification of the set of all kinases and their substrates. Experimental characterization of kinase substrates is both expensive and time-consuming. To expedite the discovery of novel substrates, computational approaches based on kinase recognition sequence (motifs) from known substrates, protein structure, interaction and co-localization have been proposed. However, rarely do these methods take into account the dynamic responses of signaling cascades measured from in vivo cellular systems. Given that recent advances in mass spectrometry-based technologies make it possible to quantify phosphorylation on a proteome-wide scale, computational approaches that can integrate static features with dynamic phosphoproteome data would greatly facilitate the prediction of biologically relevant kinase-specific substrates. Results: Here, we propose a positive-unlabeled ensemble learning approach that integrates dynamic phosphoproteomics data with static kinase recognition motifs to predict novel substrates for kinases of interest. We extended a positive-unlabeled learning technique for an ensemble model, which significantly improves prediction sensitivity on novel substrates of kinases while retaining high specificity. We evaluated the performance of the proposed model using simulation studies and subsequently applied it to predict novel substrates of key kinases relevant to insulin signaling. Our analyses show that static sequence motifs and dynamic phosphoproteomics data are complementary and that the proposed integrated model performs better than methods relying only on static information for accurate prediction of kinase-specific substrates. Availability and implementation: Executable GUI tool, source code and documentation are freely available at https://github.com/PengyiYang/KSP-PUEL . Contact: pengyi.yang@nih.gov or jothi@mail.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2016-01-10
    Description: Motivation: Statistically assessing the relation between a set of genomic regions and other genomic features is a common challenging task in genomic and epigenomic analyses. Randomization based approaches implicitly take into account the complexity of the genome without the need of assuming an underlying statistical model. Summary: regioneR is an R package that implements a permutation test framework specifically designed to work with genomic regions. In addition to the predefined randomization and evaluation strategies, regioneR is fully customizable allowing the use of custom strategies to adapt it to specific questions. Finally, it also implements a novel function to evaluate the local specificity of the detected association. Availability and implementation: regioneR is an R package released under Artistic-2.0 License. The source code and documents are freely available through Bioconductor ( http://www.bioconductor.org/packages/regioneR ). Contact: rmalinverni@carrerasresearch.org
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2016-01-10
    Description: : We present a method to identify approximately independent blocks of linkage disequilibrium in the human genome. These blocks enable automated analysis of multiple genome-wide association studies. Availability and implementation: code: http://bitbucket.org/nygcresearch/ldetect ; data: http://bitbucket.org/nygcresearch/ldetect-data . Contact: tberisa@nygenome.org Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2016-01-10
    Description: : DNA methylation is one of the most commonly studied epigenetic modifications due to its role in both disease and development. The Illumina HumanMethylation450 BeadChip is a cost-effective way to profile 〉450 000 CpGs across the human genome, making it a popular platform for profiling DNA methylation. Here we introduce missMethyl, an R package with a suite of tools for performing normalization, removal of unwanted variation in differential methylation analysis, differential variability testing and gene set analysis for the 450K array. Availability and implementation: missMethyl is an R package available from the Bioconductor project at www.bioconductor.org. Contact: alicia.oshlack@mcri.edu.au Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2016-01-10
    Description: Motivation: Antibody amino-acid sequences can be numbered to identify equivalent positions. Such annotations are valuable for antibody sequence comparison, protein structure modelling and engineering. Multiple different numbering schemes exist, they vary in the nomenclature they use to annotate residue positions, their definitions of position equivalence and their popularity within different scientific disciplines. However, currently no publicly available software exists that can apply all the most widely used schemes or for which an executable can be obtained under an open license. Results: ANARCI is a tool to classify and number antibody and T-cell receptor amino-acid variable domain sequences. It can annotate sequences with the five most popular numbering schemes: Kabat, Chothia, Enhanced Chothia, IMGT and AHo. Availability and implementation: ANARCI is available for download under GPLv3 license at opig.stats.ox.ac.uk/webapps/anarci. A web-interface to the program is available at the same address. Contact: deane@stats.ox.ac.uk
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2016-01-10
    Description: Contributions: We developed a new lossless compression method for WIG data, named smallWig, offering the best known compression rates for RNA-seq data and featuring random access functionalities that enable visualization, summary statistics analysis and fast queries from the compressed files. Our approach results in order of magnitude improvements compared with bigWig and ensures compression rates only a fraction of those produced by cWig. The key features of the smallWig algorithm are statistical data analysis and a combination of source coding methods that ensure high flexibility and make the algorithm suitable for different applications. Furthermore, for general-purpose file compression, the compression rate of smallWig approaches the empirical entropy of the tested WIG data. For compression with random query features, smallWig uses a simple block-based compression scheme that introduces only a minor overhead in the compression rate. For archival or storage space-sensitive applications, the method relies on context mixing techniques that lead to further improvements of the compression rate. Implementations of smallWig can be executed in parallel on different sets of chromosomes using multiple processors, thereby enabling desirable scaling for future transcriptome Big Data platforms. Motivation: The development of next-generation sequencing technologies has led to a dramatic decrease in the cost of DNA/RNA sequencing and expression profiling. RNA-seq has emerged as an important and inexpensive technology that provides information about whole transcriptomes of various species and organisms, as well as different organs and cellular communities. The vast volume of data generated by RNA-seq experiments has significantly increased data storage costs and communication bandwidth requirements. Current compression tools for RNA-seq data such as bigWig and cWig either use general-purpose compressors (gzip) or suboptimal compression schemes that leave significant room for improvement. To substantiate this claim, we performed a statistical analysis of expression data in different transform domains and developed accompanying entropy coding methods that bridge the gap between theoretical and practical WIG file compression rates. Results: We tested different variants of the smallWig compression algorithm on a number of integer-and real- (floating point) valued RNA-seq WIG files generated by the ENCODE project. The results reveal that, on average, smallWig offers 18-fold compression rate improvements, up to 2.5-fold compression time improvements, and 1.5-fold decompression time improvements when compared with bigWig. On the tested files, the memory usage of the algorithm never exceeded 90 KB. When more elaborate context mixing compressors were used within smallWig, the obtained compression rates were as much as 23 times better than those of bigWig. For smallWig used in the random query mode, which also supports retrieval of the summary statistics, an overhead in the compression rate of roughly 3–17% was introduced depending on the chosen system parameters. An increase in encoding and decoding time of 30% and 55% represents an additional performance loss caused by enabling random data access. We also implemented smallWig using multi-processor programming. This parallelization feature decreases the encoding delay 2–3.4 times compared with that of a single-processor implementation, with the number of processors used ranging from 2 to 8; in the same parameter regime, the decoding delay decreased 2–5.2 times. Availability and implementation: The smallWig software can be downloaded from: http://stanford.edu/~zhiyingw/smallWig/smallwig.html , http://publish.illinois.edu/milenkovic/ , http://web.stanford.edu/~tsachy/ . Contact: zhiyingw@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2016-01-10
    Description: S -sulfenylation ( S -sulphenylation, or sulfenic acid), the covalent attachment of S -hydroxyl (–SOH) to cysteine thiol, plays a significant role in redox regulation of protein functions. Although sulfenic acid is transient and labile, most of its physiological activities occur under control of S -hydroxylation. Therefore, discriminating the substrate site of S -sulfenylated proteins is an essential task in computational biology for the furtherance of protein structures and functions. Research into S -sulfenylated protein is currently very limited, and no dedicated tools are available for the computational identification of SOH sites. Given a total of 1096 experimentally verified S -sulfenylated proteins from humans, this study carries out a bioinformatics investigation on SOH sites based on amino acid composition and solvent-accessible surface area. A TwoSampleLogo indicates that the positively and negatively charged amino acids flanking the SOH sites may impact the formulation of S -sulfenylation in closed three-dimensional environments. In addition, the substrate motifs of SOH sites are studied using the maximal dependence decomposition (MDD). Based on the concept of binary classification between SOH and non-SOH sites, Support vector machine (SVM) is applied to learn the predictive model from MDD-identified substrate motifs. According to the evaluation results of 5-fold cross-validation, the integrated SVM model learned from substrate motifs yields an average accuracy of 0.87, significantly improving the prediction of SOH sites. Furthermore, the integrated SVM model also effectively improves the predictive performance in an independent testing set. Finally, the integrated SVM model is applied to implement an effective web resource, named MDD-SOH, to identify SOH sites with their corresponding substrate motifs. Availability and implementation: The MDD-SOH is now freely available to all interested users at http://csb.cse.yzu.edu.tw/MDDSOH/ . All of the data set used in this work is also available for download in the website. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: francis@saturn.yzu.edu.tw
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2016-01-10
    Description: Motivation: Aptamers are synthetic nucleic acid molecules that can bind biological targets in virtue of both their sequence and three-dimensional structure. Aptamers are selected using SELEX, Systematic Evolution of Ligands by EXponential enrichment, a technique that exploits aptamer-target binding affinity. The SELEX procedure, coupled with high-throughput sequencing (HT-SELEX), creates billions of random sequences capable of binding different epitopes on specific targets. Since this technique produces enormous amounts of data, computational analysis represents a critical step to screen and select the most biologically relevant sequences. Results: Here, we present APTANI, a computational tool to identify target-specific aptamers from HT-SELEX data and secondary structure information. APTANI builds on AptaMotif algorithm, originally implemented to analyze SELEX data; extends the applicability of AptaMotif to HT-SELEX data and introduces new functionalities, as the possibility to identify binding motifs, to cluster aptamer families or to compare output results from different HT-SELEX cycles. Tabular and graphical representations facilitate the downstream biological interpretation of results. Availability and implementation: APTANI is available at http://aptani.unimore.it . Contact: silvio.bicciato@unimore.it Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2016-01-10
    Description: Motivation: High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a ‘large P , small n ’ setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses. Results: We present here a generalized method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs. Availability and implementation: The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html . Contact: tjh48@cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2016-01-10
    Description: Motivation: Current informatic techniques for processing raw chromatography/mass spectrometry data break down under several common, non-ideal conditions. Importantly, hydrophilic liquid interaction chromatography (a key separation technology for metabolomics) produces data which are especially challenging to process. We identify three critical points of failure in current informatic workflows: compound specific drift, integration region variance, and naive missing value imputation. We implement the Warpgroup algorithm to address these challenges. Results: Warpgroup adds peak subregion detection, consensus integration bound detection, and intelligent missing value imputation steps to the conventional informatic workflow. When compared with the conventional workflow, Warpgroup made major improvements to the processed data. The coefficient of variation for peaks detected in replicate injections of a complex Escherichia Coli extract were halved (a reduction of 19%). Integration regions across samples were much more robust. Additionally, many signals lost by the conventional workflow were ‘rescued’ by the Warpgroup refinement, thereby resulting in greater analyte coverage in the processed data. Availability and i mplementation: Warpgroup is an open source R package available on GitHub at github.com/nathaniel-mahieu/warpgroup. The package includes example data and XCMS compatibility wrappers for ease of use. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: nathaniel.mahieu@wustl.edu or gjpattij@wustl.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2016-01-10
    Description: Motivation: The recognition and normalization of cell line names in text is an important task in biomedical text mining research, facilitating for instance the identification of synthetically lethal genes from the literature. While several tools have previously been developed to address cell line recognition, it is unclear whether available systems can perform sufficiently well in realistic and broad-coverage applications such as extracting synthetically lethal genes from the cancer literature. In this study, we revisit the cell line name recognition task, evaluating both available systems and newly introduced methods on various resources to obtain a reliable tagger not tied to any specific subdomain. In support of this task, we introduce two text collections manually annotated for cell line names: the broad-coverage corpus Gellus and CLL, a focused target domain corpus. Results: We find that the best performance is achieved using NERsuite, a machine learning system based on Conditional Random Fields, trained on the Gellus corpus and supported with a dictionary of cell line names. The system achieves an F-score of 88.46% on the test set of Gellus and 85.98% on the independently annotated CLL corpus. It was further applied at large scale to 24 302 102 unannotated articles, resulting in the identification of 5 181 342 cell line mentions, normalized to 11 755 unique cell line database identifiers. Availability and implementation: The manually annotated datasets, the cell line dictionary, derived corpora, NERsuite models and the results of the large-scale run on unannotated texts are available under open licenses at http://turkunlp.github.io/Cell-line-recognition/ . Contact: sukaew@utu.fi
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2016-01-10
    Description: Motivation: Chemical cross-linking with mass spectrometry (XL-MS) provides structural information for proteins and protein complexes in the form of crosslinked residue proximity and distance constraints between reactive residues. Utilizing spatial information derived from cross-linked residues can therefore assist with structural modeling of proteins. Selection of computationally derived model structures of proteins remains a major challenge in structural biology. The comparison of site interactions resulting from XL-MS with protein structure contact maps can assist the selection of structural models. Availability and implementation: XLmap was implemented in R and is freely available at: http://brucelab.gs.washington.edu/software.php . Contact: jimbruce@uw.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2016-01-10
    Description: : The Insight Toolkit offers plenty of features for multidimensional image analysis. Current implementations, however, often suffer either from a lack of flexibility due to hard-coded C++ pipelines for a certain task or by slow execution times, e.g. caused by inefficient implementations or multiple read/write operations for separate filter execution. We present an XML-based wrapper application for the Insight Toolkit that combines the performance of a pure C++ implementation with an easy-to-use graphical setup of dynamic image analysis pipelines. Created XML pipelines can be interpreted and executed by XPIWIT in console mode either locally or on large clusters. We successfully applied the software tool for the automated analysis of terabyte-scale, time-resolved 3D image data of zebrafish embryos. Availability and implementation: XPIWIT is implemented in C++ using the Insight Toolkit and the Qt SDK. It has been successfully compiled and tested under Windows and Unix-based systems. Software and documentation are distributed under Apache 2.0 license and are publicly available for download at https://bitbucket.org/jstegmaier/xpiwit/downloads/ . Contact: johannes.stegmaier@kit.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2016-01-10
    Description: : PathwaysWeb is a resource-based, well-documented web system that provides publicly available information on genes, biological pathways, Gene Ontology (GO) terms, gene–gene interaction networks (importantly, with the directionality of interactions) and links to key-related PubMed documents. The PathwaysWeb API simplifies the construction of applications that need to retrieve and interrelate information across multiple, pathway-related data types from a variety of original data sources. PathwaysBrowser is a companion website that enables users to explore the same integrated pathway data. The PathwaysWeb system facilitates reproducible analyses by providing access to all versions of the integrated datasets. Although its GO subsystem includes data for mouse, PathwaysWeb currently focuses on human data. However, pathways for mouse and many other species can be inferred with a high success rate from human pathways. Availability and implementation: PathwaysWeb can be accessed via the Internet at http://bioinformatics.mdanderson.org/main/PathwaysWeb:Overview . Contact: jmmelott@mdanderson.org Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2016-01-10
    Description: Motivation: To increase detection power, gene level analysis methods are used to aggregate weak signals. To greatly increase computational efficiency, most methods use as input summary statistics from genome-wide association studies (GWAS). Subsequently, gene statistics are constructed using linkage disequilibrium (LD) patterns from a relevant reference panel. However, all methods, including our own Joint Effect on Phenotype of eQTL/functional single nucleotide polymorphisms (SNPs) associated with a Gene (JEPEG), assume homogeneous panels, e.g. European. However, this renders these tools unsuitable for the analysis of large cosmopolitan cohorts. Results: We propose a JEPEG extension, JEPEGMIX, which similar to one of our software tools, Direct Imputation of summary STatistics of unmeasured SNPs from MIXed ethnicity cohorts, is capable of estimating accurate LD patterns for cosmopolitan cohorts. JEPEGMIX uses this accurate LD estimates to (i) impute the summary statistics at unmeasured functional variants and (ii) test for the joint effect of all measured and imputed functional variants which are associated with a gene. We illustrate the performance of our tool by analyzing the GWAS meta-analysis summary statistics from the multi-ethnic Psychiatric Genomics Consortium Schizophrenia stage 2 cohort. This practical application supports the immune system being one of the main drivers of the process leading to schizophrenia. Availability and implementation: Software, annotation database and examples are available at http://dleelab.github.io/jepegmix/ . Contact: donghyung.lee@vcuhealth.org Supplementary information: Supplementary material is available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2016-01-10
    Description: : SIMToolbox is an open-source, modular set of functions for MATLAB equipped with a user-friendly graphical interface and designed for processing two-dimensional and three-dimensional data acquired by structured illumination microscopy (SIM). Both optical sectioning and super-resolution applications are supported. The software is also capable of maximum a posteriori probability image estimation (MAP-SIM), an alternative method for reconstruction of structured illumination images. MAP-SIM can potentially reduce reconstruction artifacts, which commonly occur due to refractive index mismatch within the sample and to imperfections in the illumination. Availability and implementation: SIMToolbox, example data and the online documentation are freely accessible at http://mmtg.fel.cvut.cz/SIMToolbox . Contact: ghagen@uccs.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2016-03-26
    Description: The Illumina Infinium HumanMethylation450 BeadChip (450k) is widely used for the evaluation of DNA methylation levels in large-scale datasets, particularly in cancer. The 450k design allows copy number variant (CNV) calling using existing bioinformatics tools. However, in cancer samples, numerous large-scale aberrations cause shifting in the probe intensities and thereby may result in erroneous CNV calling. Therefore, a baseline correction process is needed. We suggest the maximum peak of probe segment density to correct the shift in the intensities in cancer samples. Availability and implementation : CopyNumber450kCancer is implemented as an R package. The package with examples can be downloaded at http://cran.r-project.org . Contact: nour.marzouka@medsci.uu.se Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2016-03-26
    Description: : During the past years we have witnessed the rapid development of new metagenome assembly methods. Although there are many benchmark utilities designed for single-genome assemblies, there is no well-recognized evaluation and comparison tool for metagenomic-specific analogues. In this article, we present MetaQUAST, a modification of QUAST, the state-of-the-art tool for genome assembly evaluation based on alignment of contigs to a reference. MetaQUAST addresses such metagenome datasets features as (i) unknown species content by detecting and downloading reference sequences, (ii) huge diversity by giving comprehensive reports for multiple genomes and (iii) presence of highly relative species by detecting chimeric contigs. We demonstrate MetaQUAST performance by comparing several leading assemblers on one simulated and two real datasets. Availability and implementation: http://bioinf.spbau.ru/metaquast . Contact: aleksey.gurevich@spbu.ru Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2016-03-26
    Description: Motivation: Identifying drug–target protein interaction is a crucial step in the process of drug research and development. Wet-lab experiment are laborious, time-consuming and expensive. Hence, there is a strong demand for the development of a novel theoretical method to identify potential interaction between drug and target protein. Results: We use all known proteins and drugs to construct a nodes- and edges-weighted biological relevant interactome network. On the basis of the ‘guilt-by-association’ principle, novel network topology features are proposed to characterize interaction pairs and random forest algorithm is employed to identify potential drug–protein interaction. Accuracy of 92.53% derived from the 10-fold cross-validation is about 10% higher than that of the existing method. We identify 2272 potential drug–target interactions, some of which are associated with diseases, such as Torg-Winchester syndrome and rhabdomyosarcoma. The proposed method can not only accurately predict the interaction between drug molecule and target protein, but also help disease treatment and drug discovery. Contacts: zhanchao8052@gmail.com or ceszxy@mail.sysu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2016-03-26
    Description: : A successful approach for predicting functional associations between non-homologous genes is to compare their phylogenetic distributions. We have devised a phylogenetic profiling algorithm, SVD-Phy, which uses truncated singular value decomposition to address the problem of uninformative profiles giving rise to false positive predictions. Benchmarking the algorithm against the KEGG pathway database, we found that it has substantially improved performance over existing phylogenetic profiling methods. Availability and implementation: The software is available under the open-source BSD license at https://bitbucket.org/andrea/svd-phy Contact: lars.juhl.jensen@cpr.ku.dk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2016-03-26
    Description: : Tests for differential gene expression with RNA-seq data have a tendency to identify certain types of transcripts as significant, e.g. longer and highly-expressed transcripts. This tendency has been shown to bias gene set enrichment (GSE) testing, which is used to find over- or under-represented biological functions in the data. Yet, there remains a surprising lack of tools for GSE testing specific for RNA-seq. We present a new GSE method for RNA-seq data, RNA-Enrich, that accounts for the above tendency empirically by adjusting for average read count per gene. RNA-Enrich is a quick, flexible method and web-based tool, with 16 available gene annotation databases. It does not require a P -value cut-off to define differential expression, and works well even with small sample-sized experiments. We show that adjusting for read counts per gene improves both the type I error rate and detection power of the test. Availability and implementation: RNA-Enrich is available at http://lrpath.ncibi.org or from supplemental material as R code. Contact: sartorma@umich.edu Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2016-03-26
    Description: : Supervised classification based on support vector machines (SVMs) has successfully been used for the prediction of cis -regulatory modules (CRMs). However, no integrated tool using such heterogeneous data as position-specific scoring matrices, ChIP-seq data or conservation scores is currently available. Here, we present LedPred, a flexible SVM workflow that predicts new regulatory sequences based on the annotation of known CRMs, which are associated to a large variety of feature types. LedPred is provided as an R/Bioconductor package connected to an online server to avoid installation of non-R software. Due to the heterogeneous CRM feature integration, LedPred excels at the prediction of regulatory sequences in Drosophila and mouse datasets compared with similar SVM-based software. Availability and implementation: LedPred is available on GitHub: https://github.com/aitgon/LedPred and Bioconductor: http://bioconductor.org/packages/release/bioc/html/LedPred.html under the MIT license. Contact: aitor.gonzalez@univ-amu.fr Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2016-03-26
    Description: Motivation: Data compression is crucial in effective handling of genomic data. Among several recently published algorithms, ERGC seems to be surprisingly good, easily beating all of the competitors. Results: We evaluated ERGC and the previously proposed algorithms GDC and iDoComp, which are the ones used in the original paper for comparison, on a wide data set including 12 assemblies of human genome (instead of only four of them in the original paper). ERGC wins only when one of the genomes (referential or target) contains mixed-cased letters (which is the case for only the two Korean genomes). In all other cases ERGC is on average an order of magnitude worse than GDC and iDoComp. Contact: sebastian.deorowicz@polsl.pl , iochoa@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2016-03-26
    Description: Motivation: Large-scale genotype datasets can help track the dispersal patterns of epidemiological outbreaks and predict the geographic origins of individuals. Such genetically-based geographic assignments also show a range of possible applications in forensics for profiling both victims and criminals, and in wildlife management, where poaching hotspot areas can be located. They, however, require fast and accurate statistical methods to handle the growing amount of genetic information made available from genotype arrays and next-generation sequencing technologies. Results: We introduce a novel statistical method for geopositioning individuals of unknown origin from genotypes. Our method is based on a geostatistical model trained with a dataset of georeferenced genotypes. Statistical inference under this model can be implemented within the theoretical framework of Integrated Nested Laplace Approximation, which represents one of the major recent breakthroughs in statistics, as it does not require Monte Carlo simulations. We compare the performance of our method and an alternative method for geospatial inference, SPA in a simulation framework. We highlight the accuracy and limits of continuous spatial assignment methods at various scales by analyzing genotype datasets from a diversity of species, including Florida Scrub-jay birds Aphelocoma coerulescens, Arabidopsis thaliana and humans, representing 41–197,146 SNPs. Our method appears to be best suited for the analysis of medium-sized datasets (a few tens of thousands of loci), such as reduced-representation sequencing data that become increasingly available in ecology. Availability and implementation: http://www2.imm.dtu.dk/~gigu/Spasiba/ Contact: gilles.b.guillot@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2016-04-08
    Description: Motivation: Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters’ content and lack of other distinguishing sequence features. Results: We present Cluster Assignment by Islands of Sites (CASSIS), a method for SM cluster prediction in eukaryotic genomes, and Secondary Metabolites by InterProScan (SMIPS), a tool for genome-wide detection of SM key enzymes (‘anchor’ genes): polyketide synthases, non-ribosomal peptide synthetases and dimethylallyl tryptophan synthases. Unlike other tools based on protein similarity, CASSIS exploits the idea of co-regulation of the cluster genes, which assumes the existence of common regulatory patterns in the cluster promoters. The method searches for ‘islands’ of enriched cluster-specific motifs in the vicinity of anchor genes. It was validated in a series of cross-validation experiments and showed high sensitivity and specificity. Availability and implementation: CASSIS and SMIPS are freely available at https://sbi.hki-jena.de/cassis . Contact: thomas.wolf@leibniz-hki.de or ekaterina.shelest@leibniz-hki.de Supplementary information : Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2016-04-08
    Description: Motivation: In the systems biology era, high-throughput omics technologies have enabled the unraveling of the interplay of some biological entities on a large scale (e.g. genes, proteins, metabolites or RNAs). Huge biological networks have emerged, where nodes correspond to these entities and edges between them model their relations. Protein–protein interaction networks, for instance, show the physical interactions of proteins in an organism. The comparison of such networks promises additional insights into protein and cell function as well as knowledge-transfer across species. Several computational approaches have been developed previously to solve the network alignment (NA) problem, but only a few concentrate on the usability of the implemented tools for the evaluation of protein–protein interactions by the end users (biologists and medical researchers). Results: We have created CytoGEDEVO, a Cytoscape app for visual and user-assisted NA. It extends the previous GEDEVO methodology for global pairwise NAs with new graphical and functional features. Our main focus was on the usability, even by non-programmers and the interpretability of the NA results with Cytoscape. Availability and implementation: CytoGEDEVO is publicly available from the Cytoscape app store at http://apps.cytoscape.org/apps/cytogedevo . In addition, we provide stand-alone command line executables, source code, documentation and step-by-step user instructions at http://cytogedevo.compbio.sdu.dk . Contact: malek@tugraz.at Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2016-04-08
    Description: INSECT is a user-friendly web server to predict the occurrence of Cis -Regulatory Modules (CRMs), which control gene expression. Here, we present a new release of INSECT which includes several new features, such as whole genome analysis, nucleosome occupancy predictions, and which provides additional links to third-party functional tools that complement user capabilities, CRM analysis and hypothesis construction. Improvements in the core implementation have led to a faster and more efficient tool. In addition, this new release introduces a new interface designed for a more integrative and dynamic user experience. Availability and implementation: http://bioinformatics.ibioba-mpsp-conicet.gov.ar/INSECT2 Contact: pyankilevich@ibioba-mpsp-conicet.gov.ar
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2016-04-08
    Description: Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu . Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2016-04-08
    Description: Motivation: To gain a deeper understanding of biological processes and their relevance in disease, mathematical models are built upon experimental data. Uncertainty in the data leads to uncertainties of the model’s parameters and in turn to uncertainties of predictions. Mechanistic dynamic models of biochemical networks are frequently based on nonlinear differential equation systems and feature a large number of parameters, sparse observations of the model components and lack of information in the available data. Due to the curse of dimensionality , classical and sampling approaches propagating parameter uncertainties to predictions are hardly feasible and insufficient. However, for experimental design and to discriminate between competing models, prediction and confidence bands are essential. To circumvent the hurdles of the former methods, an approach to calculate a profile likelihood on arbitrary observations for a specific time point has been introduced, which provides accurate confidence and prediction intervals for nonlinear models and is computationally feasible for high-dimensional models. Results: In this article, reliable and smooth point-wise prediction and confidence bands to assess the model’s uncertainty on the whole time-course are achieved via explicit integration with elaborate correction mechanisms. The corresponding system of ordinary differential equations is derived and tested on three established models for cellular signalling. An efficiency analysis is performed to illustrate the computational benefit compared with repeated profile likelihood calculations at multiple time points. Availability and implementation: The integration framework and the examples used in this article are provided with the software package Data2Dynamics, which is based on MATLAB and freely available at http://www.data2dynamics.org . Contact: helge.hass@fdm.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2016-04-08
    Description: Motivation: Discovering patterns in networks of protein–protein interactions (PPIs) is a central problem in systems biology. Alignments between these networks aid functional understanding as they uncover important information, such as evolutionary conserved pathways, protein complexes and functional orthologs. However, the complexity of the multiple network alignment problem grows exponentially with the number of networks being aligned and designing a multiple network aligner that is both scalable and that produces biologically relevant alignments is a challenging task that has not been fully addressed. The objective of multiple network alignment is to create clusters of nodes that are evolutionarily and functionally conserved across all networks. Unfortunately, the alignment methods proposed thus far do not meet this objective as they are guided by pairwise scores that do not utilize the entire functional and evolutionary information across all networks. Results: To overcome this weakness, we propose Fuse, a new multiple network alignment algorithm that works in two steps. First, it computes our novel protein functional similarity scores by fusing information from wiring patterns of all aligned PPI networks and sequence similarities between their proteins. This is in contrast with the previous tools that are all based on protein similarities in pairs of networks being aligned. Our comprehensive new protein similarity scores are computed by Non-negative Matrix Tri-Factorization (NMTF) method that predicts associations between proteins whose homology (from sequences) and functioning similarity (from wiring patterns) are supported by all networks. Using the five largest and most complete PPI networks from BioGRID, we show that NMTF predicts a large number protein pairs that are biologically consistent. Second, to identify clusters of aligned proteins over all networks, Fuse uses our novel maximum weight k -partite matching approximation algorithm. We compare Fuse with the state of the art multiple network aligners and show that (i) by using only sequence alignment scores, Fuse already outperforms other aligners and produces a larger number of biologically consistent clusters that cover all aligned PPI networks and (ii) using both sequence alignments and topological NMTF-predicted scores leads to the best multiple network alignments thus far. Availability and implementation: Our dataset and software are freely available from the web site: http://bio-nets.doc.ic.ac.uk/Fuse/ . Contact: natasha@imperial.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2016-04-08
    Description: : The nucleotide binding site leucine-rich repeats (NBSLRRs) belong to one of the largest known families of disease resistance genes that encode resistance proteins (R-protein) against the pathogens of plants. Various defence mechanisms have explained the regulation of plant immunity, but still, we have limited understanding about plant defence against different pathogens. Identification of R-proteins and proteins having R-protein-like features across the genome, transcriptome and proteome would be highly useful to develop the global understanding of plant defence mechanisms, but it is laborious and time-consuming task. Therefore, we have developed a support vector machine-based high-throughput pipeline called NBSPred to differentiate NBSLRR and NBSLRR-like protein from Non-NBSLRR proteins from genome, transcriptome and protein sequences. The pipeline was tested and validated with input sequences from three dicot and two monocot plants including Arabidopsis thaliana, Boechera stricta, Brachypodium distachyon Solanum lycopersicum and Zea mays. Availability and implementation: The NBSPred pipeline is available at http://soilecology.biol.lu.se/nbs/ . Supplementary information: Supplementary data are available at Bioinformatics online. Contact: sandeep.kushwaha@biol.lu.se
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2016-04-08
    Description: Motivation: RNA interference (RNAi) technology is being developed as a weapon for pest insect control. To maximize the specificity that such an approach affords we have developed a bioinformatic web tool that searches the ever-growing arthropod transcriptome databases so that pest-specific RNAi sequences can be identified. This will help technology developers finesse the design of RNAi sequences and suggests which non-target species should be assessed in the risk assessment process. Availability and implementation: http://rnai.specifly.org . Contact: crobin@unimelb.edu.au
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2016-04-08
    Description: : Diffusion maps are a spectral method for non-linear dimension reduction and have recently been adapted for the visualization of single-cell expression data. Here we present destiny , an efficient R implementation of the diffusion map algorithm. Our package includes a single-cell specific noise model allowing for missing and censored values. In contrast to previous implementations, we further present an efficient nearest-neighbour approximation that allows for the processing of hundreds of thousands of cells and a functionality for projecting new data on existing diffusion maps. We exemplarily apply destiny to a recent time-resolved mass cytometry dataset of cellular reprogramming. Availability and implementation : destiny is an open-source R/Bioconductor package "bioconductor.org/packages/destiny" also available at www.helmholtz-muenchen.de/icb/destiny . A detailed vignette describing functions and workflows is provided with the package. Contact: carsten.marr@helmholtz-muenchen.de or f.buettner@helmholtz-muenchen.de Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2016-04-08
    Description: Motivation: Gene ontology (GO) is a widely used resource to describe the attributes for gene products. However, automatic GO maintenance remains to be difficult because of the complex logical reasoning and the need of biological knowledge that are not explicitly represented in the GO. The existing studies either construct whole GO based on network data or only infer the relations between existing GO terms. None is purposed to add new terms automatically to the existing GO. Results: We proposed a new algorithm ‘GOExtender’ to efficiently identify all the connected gene pairs labeled by the same parent GO terms. GOExtender is used to predict new GO terms with biological network data, and connect them to the existing GO. Evaluation tests on biological process and cellular component categories of different GO releases showed that GOExtender can extend new GO terms automatically based on the biological network. Furthermore, we applied GOExtender to the recent release of GO and discovered new GO terms with strong support from literature. Availability and implementation: Software and supplementary document are available at www.msu.edu/%7Ejinchen/GOExtender Contact: jinchen@msu.edu or ydwang@hit.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2016-06-25
    Description: Motivation: The concept of Minimal Cut Sets (MCSs) is used in metabolic network modeling to describe minimal groups of reactions or genes whose simultaneous deletion eliminates the capability of the network to perform a specific task. Previous work showed that MCSs where closely related to Elementary Flux Modes (EFMs) in a particular dual problem, opening up the possibility to use the tools developed for computing EFMs to compute MCSs. Until recently, however, there existed no method to compute an EFM with some specific characteristic, meaning that, in the case of MCSs, the only strategy to obtain them was to enumerate them using, for example, the standard K-shortest EFMs algorithm. Results: In this work, we adapt the recently developed theory to compute EFMs satisfying several constraints to the calculation of MCSs involving a specific reaction knock-out. Importantly, we emphasize that not all the EFMs in the dual problem correspond to real MCSs, and propose a new formulation capable of correctly identifying the MCS wanted. Furthermore, this formulation brings interesting insights about the relationship between the primal and the dual problem of the MCS computation. Availability and implementation: A Matlab-Cplex implementation of the proposed algorithm is available as a supplementary material . Contact: fplanes@ceit.es Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2016-06-25
    Description: : Transcript Structure and Domain Display (TSDD) is a publicly available, web-based program that provides publication quality images of transcript structures and domains. TSDD is capable of producing transcript structures from GFF/GFF3 and BED files. Alternatively, the GFF files of several model organisms have been pre-loaded so that users only needs to enter the locus IDs of the transcripts to be displayed. Visualization of transcripts provides many benefits to researchers, ranging from evolutionary analysis of DNA-binding domains to predictive function modeling. Availability and implementation: TSDD is freely available for non-commercial users at http://shenlab.sols.unlv.edu/shenlab/software/TSD/transcript_display.html . Contact : jeffery.shen@unlv.nevada.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2016-06-25
    Description: : PhamDB is a web application which creates databases of bacteriophage genes, grouped by gene similarity. It is backwards compatible with the existing Phamerator desktop software while providing an improved database creation workflow. Key features include a graphical user interface, validation of uploaded GenBank files, and abilities to import phages from existing databases, modify existing databases and queue multiple jobs. Availability and implementation: Source code and installation instructions for Linux, Windows and Mac OSX are freely available at https://github.com/jglamine/phage . PhamDB is also distributed as a docker image which can be managed via Kitematic. This docker image contains the application and all third party software dependencies as a pre-configured system, and is freely available via the installation instructions provided. Contact: snelesen@calvin.edu
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...