ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
    Publication Date: 2015-09-19
    Description: Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST—the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2015-05-29
    Description: Model evaluation is a necessary step for better prediction and design of 3D RNA structures. For proteins, this has been widely studied and the knowledge-based statistical potential has been proved to be one of effective ways to solve this problem. Currently, a few knowledge-based statistical potentials have also been proposed to evaluate predicted models of RNA tertiary structures. The benchmark tests showed that they can identify the native structures effectively but further improvements are needed to identify near-native structures and those with non-canonical base pairs. Here, we present a novel knowledge-based potential, 3dRNAscore, which combines distance-dependent and dihedral-dependent energies. The benchmarks on different testing datasets all show that 3dRNAscore are more efficient than existing evaluation methods in recognizing native state from a pool of near-native states of RNAs as well as in ranking near-native states of RNA models.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2016-06-21
    Description: The goal of pathway analysis is to identify the pathways that are significantly impacted when a biological system is perturbed, e.g. by a disease or drug. Current methods treat pathways as independent entities. However, many signals are constantly sent from one pathway to another, essentially linking all pathways into a global, system-wide complex. In this work, we propose a set of three pathway analysis methods based on the impact analysis, that performs a system-level analysis by considering all signals between pathways, as well as their overlaps. Briefly, the global system is modeled in two ways: (i) considering the inter-pathway interaction exchange for each individual pathways, and (ii) combining all individual pathways to form a global, system-wide graph. The third analysis method is a hybrid of these two models. The new methods were compared with DAVID, GSEA, GSA, PathNet, Crosstalk and SPIA on 23 GEO data sets involving 19 tissues investigated in 12 conditions. The results show that both the ranking and the P -values of the target pathways are substantially improved when the analysis considers the system-wide dependencies and interactions between pathways.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2016-07-09
    Description: Bioinformatic analysis often produces large sets of genomic ranges that can be difficult to interpret in the absence of genomic context. Goldmine annotates genomic ranges from any source with gene model and feature contexts to facilitate global descriptions and candidate loci discovery. We demonstrate the value of genomic context by using Goldmine to elucidate context dynamics in transcription factor binding and to reveal differentially methylated regions (DMRs) with context-specific functional correlations. The open source R package and documentation for Goldmine are available at http://jeffbhasin.github.io/goldmine .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2016-07-28
    Description: CCCTC-binding factor (CTCF) is a multi-functional protein that is assigned various, even contradictory roles in the genome. High-throughput sequencing-based technologies such as ChIP-seq and Hi-C provided us the opportunity to assess the multivalent functions of CTCF in the human genome. The location of CTCF-binding sites with respect to genomic features provides insights into the possible roles of this protein. Here we present the first genome-wide survey and characterization of three important functions of CTCF: enhancer insulator, chromatin barrier and enhancer linker. We developed a novel computational framework to discover the multivalent functions of CTCF based on chromatin state and three-dimensional chromatin architecture. We applied our method to five human cell lines and identified ~46 000 non-redundant CTCF sites related to the three functions. Disparate effects of these functions on gene expression were found and distinct genomic features of these CTCF sites were characterized in GM12878 cells. Finally, we investigated the cell-type specificities of CTCF sites related to these functions across five cell types. Our study provides new insights into the multivalent functions of CTCF in the human genome.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2015-05-03
    Description: MicroRNAs (miRNAs) regulate gene expression by binding to partially complementary sequences on target mRNA transcripts, thereby causing their degradation, deadenylation, or inhibiting their translation. Genomic variants can alter miRNA regulation by modifying miRNA target sites, and multiple human disease phenotypes have been linked to such miRNA target site variants (miR-TSVs). However, systematic genome-wide identification of functional miR-TSVs is difficult due to high false positive rates; functional miRNA recognition sequences can be as short as six nucleotides, with the human genome encoding thousands of miRNAs. Furthermore, while large-scale clinical genomic data sets are becoming increasingly commonplace, existing miR-TSV prediction methods are not designed to analyze these data. Here, we present an open-source tool called SubmiRine that is designed to perform efficient miR-TSV prediction systematically on variants identified in novel clinical genomic data sets. Most importantly, SubmiRine allows for the prioritization of predicted miR-TSVs according to their relative probability of being functional. We present the results of SubmiRine using integrated clinical genomic data from a large-scale cohort study on chronic obstructive pulmonary disease (COPD), making a number of high-scoring, novel miR-TSV predictions. We also demonstrate SubmiRine's ability to predict and prioritize known miR-TSVs that have undergone experimental validation in previous studies.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2015-01-24
    Description: Of the ~1.3 million Alu elements in the human genome, only a tiny number are estimated to be active in transcription by RNA polymerase (Pol) III. Tracing the individual loci from which Alu transcripts originate is complicated by their highly repetitive nature. By exploiting RNA-Seq data sets and unique Alu DNA sequences, we devised a bioinformatic pipeline allowing us to identify Pol III-dependent transcripts of individual Alu elements. When applied to ENCODE transcriptomes of seven human cell lines, this search strategy identified ~1300 Alu loci corresponding to detectable transcripts, with ~120 of them expressed in at least three cell lines. In vitro transcription of selected Alu s did not reflect their in vivo expression properties, and required the native 5'-flanking region in addition to internal promoter. We also identified a cluster of expressed Alu Ya5-derived transcription units, juxtaposed to snaR genes on chromosome 19, formed by a promoter-containing left monomer fused to an Alu -unrelated downstream moiety. Autonomous Pol III transcription was also revealed for Alu s nested within Pol II-transcribed genes. The ability to investigate Alu transcriptomes at single-locus resolution will facilitate both the identification of novel biologically relevant Alu RNAs and the assessment of Alu expression alteration under pathological conditions.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2016-04-08
    Description: Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2015-10-15
    Description: Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro , are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein–protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2016-06-03
    Description: The sequential chain of interactions altering the binary state of a biomolecule represents the ‘information flow’ within a cellular network that determines phenotypic properties. Given the lack of computational tools to dissect context-dependent networks and gene activities, we developed NetDecoder, a network biology platform that models context-dependent information flows using pairwise phenotypic comparative analyses of protein–protein interactions. Using breast cancer, dyslipidemia and Alzheimer's disease as case studies, we demonstrate NetDecoder dissects subnetworks to identify key players significantly impacting cell behaviour specific to a given disease context. We further show genes residing in disease-specific subnetworks are enriched in disease-related signalling pathways and information flow profiles, which drive the resulting disease phenotypes. We also devise a novel scoring scheme to quantify key genes—network routers, which influence many genes, key targets, which are influenced by many genes, and high impact genes, which experience a significant change in regulation. We show the robustness of our results against parameter changes. Our network biology platform includes freely available source code ( http://www.NetDecoder.org ) for researchers to explore genome-wide context-dependent information flow profiles and key genes, given a set of genes of particular interest and transcriptome data. More importantly, NetDecoder will enable researchers to uncover context-dependent drug targets.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2015-04-21
    Description: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2015-02-18
    Description: Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2015-02-18
    Description: Genetic screens of an unprecedented scale have recently been made possible by the availability of high-complexity libraries of synthetic oligonucleotides designed to mediate either gene knockdown or gene knockout, coupled with next-generation sequencing. However, several sources of random noise and statistical biases complicate the interpretation of the resulting high-throughput data. We developed HiTSelect, a comprehensive analysis pipeline for rigorously selecting screen hits and identifying functionally relevant genes and pathways by addressing off-target effects, controlling for variance in both gene silencing efficiency and sequencing depth of coverage and integrating relevant metadata. We document the superior performance of HiTSelect using data from both genome-wide RNAi and CRISPR/Cas9 screens. HiTSelect is implemented as an open-source package, with a user-friendly interface for data visualization and pathway exploration. Binary executables are available at http://sourceforge.net/projects/hitselect/ , and the source code is available at https://github.com/diazlab/HiTSelect .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2015-01-24
    Description: Integrative analyses of epigenetic data promise a deeper understanding of the epigenome. Epidaurus is a bioinformatics tool used to effectively reveal inter-dataset relevance and differences through data aggregation, integration and visualization. In this study, we demonstrated the utility of Epidaurus in validating hypotheses and generating novel biological insights. In particular, we described the use of Epidaurus to (i) integrate epigenetic data from prostate cancer cell lines to validate the activation function of EZH2 in castration-resistant prostate cancer and to (ii) study the mechanism of androgen receptor ( AR ) binding deregulation induced by the knockdown of FOXA1 . We found that EZH2 's noncanonical activation function was reaffirmed by its association with active histone markers and the lack of association with repressive markers. More importantly, we revealed that the binding of AR was selectively reprogramed to promoter regions, leading to the up-regulation of hundreds of cancer-associated genes including EGFR . The prebuilt epigenetic dataset from commonly used cell lines (LNCaP, VCaP, LNCaP-Abl, MCF7, GM12878, K562, HeLa-S3, A549, HePG2) makes Epidaurus a useful online resource for epigenetic research. As standalone software, Epidaurus is specifically designed to process user customized datasets with both efficiency and convenience.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2015-02-18
    Description: RNA-protein complexes are essential in mediating important fundamental cellular processes, such as transport and localization. In particular, ncRNA-protein interactions play an important role in post-transcriptional gene regulation like mRNA localization, mRNA stabilization, poly-adenylation, splicing and translation. The experimental methods to solve RNA-protein interaction prediction problem remain expensive and time-consuming. Here, we present the RPI-Pred (RNA-protein interaction predictor), a new support-vector machine-based method, to predict protein-RNA interaction pairs, based on both the sequences and structures. The results show that RPI-Pred can correctly predict RNA-protein interaction pairs with ~94% prediction accuracy when using sequence and experimentally determined protein and RNA structures, and with ~83% when using sequences and predicted protein and RNA structures. Further, our proposed method RPI-Pred was superior to other existing ones by predicting more experimentally validated ncRNA-protein interaction pairs from different organisms. Motivated by the improved performance of RPI-Pred, we further applied our method for reliable construction of ncRNA-protein interaction networks. The RPI-Pred is publicly available at: http://ctsb.is.wfubmc.edu/projects/rpi-pred .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2015-02-18
    Description: Identifying conserved and divergent response patterns in gene networks is becoming increasingly important. A common approach is integrating expression information with gene association networks in order to find groups of connected genes that are activated or repressed. In many cases, researchers are also interested in comparisons across species (or conditions). Finding an active sub-network is a hard problem and applying it across species requires further considerations (e.g. orthology information, expression data and networks from different sources). To address these challenges we devised ModuleBlast, which uses both expression and network topology to search for highly relevant sub-networks. We have applied ModuleBlast to expression and interaction data from mouse, macaque and human to study immune response and aging. The immune response analysis identified several relevant modules, consistent with recent findings on apoptosis and NFB activation following infection. Temporal analysis of these data revealed cascades of modules that are dynamically activated within and across species. We have experimentally validated some of the novel hypotheses resulting from the analysis of the ModuleBlast results leading to new insights into the mechanisms used by a key mammalian aging protein.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2015-02-18
    Description: Here we used discriminative training methods to uncover the chromatin, transcription factor (TF) binding and sequence features of enhancers underlying gene expression in individual cardiac cells. We used machine learning with TF motifs and ChIP data for a core set of cardiogenic TFs and histone modifications to classify Drosophila cell-type-specific cardiac enhancer activity. We show that the classifier models can be used to predict cardiac cell subtype cis -regulatory activities. Associating the predicted enhancers with an expression atlas of cardiac genes further uncovered clusters of genes with transcription and function limited to individual cardiac cell subtypes. Further, the cell-specific enhancer models revealed chromatin, TF binding and sequence features that distinguish enhancer activities in distinct subsets of heart cells. Collectively, our results show that computational modeling combined with empirical testing provides a powerful platform to uncover the enhancers, TF motifs and gene expression profiles which characterize individual cardiac cell fates.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2015-09-30
    Description: A key aspect of RNA secondary structure prediction is the identification of novel functional elements. This is a challenging task because these elements typically are embedded in longer transcripts where the borders between the element and flanking regions have to be defined. The flanking sequences impact the folding of the functional elements both at the level of computational analyses and when the element is extracted as a transcript for experimental analysis. Here, we analyze how different flanking region lengths impact folding into a constrained structure by computing probabilities of folding for different sizes of flanking regions. Our method, RNAcop (RNA context optimization by probability), is tested on known and de novo predicted structures. In vitro experiments support the computational analysis and suggest that for a number of structures, choosing proper lengths of flanking regions is critical. RNAcop is available as web server and stand-alone software via http://rth.dk/resources/rnacop .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2016-03-01
    Description: It is being increasingly realized that nucleosome organization on DNA crucially regulates DNA–protein interactions and the resulting gene expression. While the spatial character of the nucleosome positioning on DNA has been experimentally and theoretically studied extensively, the temporal character is poorly understood. Accounting for ATPase activity and DNA-sequence effects on nucleosome kinetics, we develop a theoretical method to estimate the time of continuous exposure of binding sites of non-histone proteins (e.g. transcription factors and TATA binding proteins) along any genome. Applying the method to Saccharomyces cerevisiae , we show that the exposure timescales are determined by cooperative dynamics of multiple nucleosomes, and their behavior is often different from expectations based on static nucleosome occupancy. Examining exposure times in the promoters of GAL1 and PHO5, we show that our theoretical predictions are consistent with known experiments. We apply our method genome-wide and discover huge gene-to-gene variability of mean exposure times of TATA boxes and patches adjacent to TSS (+1 nucleosome region); the resulting timescale distributions have non-exponential tails.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2015-10-31
    Description: Systems biologists aim to decipher the structure and dynamics of signaling and regulatory networks underpinning cellular responses; synthetic biologists can use this insight to alter existing networks or engineer de novo ones. Both tasks will benefit from an understanding of which structural and dynamic features of networks can emerge from evolutionary processes, through which intermediary steps these arise, and whether they embody general design principles. As natural evolution at the level of network dynamics is difficult to study, in silico evolution of network models can provide important insights. However, current tools used for in silico evolution of network dynamics are limited to ad hoc computer simulations and models. Here we introduce BioJazz, an extendable, user-friendly tool for simulating the evolution of dynamic biochemical networks. Unlike previous tools for in silico evolution, BioJazz allows for the evolution of cellular networks with unbounded complexity by combining rule-based modeling with an encoding of networks that is akin to a genome. We show that BioJazz can be used to implement biologically realistic selective pressures and allows exploration of the space of network architectures and dynamics that implement prescribed physiological functions. BioJazz is provided as an open-source tool to facilitate its further development and use. Source code and user manuals are available at: http://oss-lab.github.io/biojazz and http://osslab.lifesci.warwick.ac.uk/BioJazz.aspx .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2015-12-02
    Description: Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information alone.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2015-12-02
    Description: Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2015-03-14
    Description: The rapid discovery of potential driver mutations through large-scale mutational analyses of human cancers generates a need to characterize their cellular phenotypes. Among the techniques for genome editing, recombinant adeno-associated virus (rAAV)-mediated gene targeting is suited for knock-in of single nucleotide substitutions and to a lesser degree for gene knock-outs. However, the generation of gene targeting constructs and the targeting process is time-consuming and labor-intense. To facilitate rAAV-mediated gene targeting, we developed the first software and complementary automation-friendly vector tools to generate optimized targeting constructs for editing human protein encoding genes. By computational approaches, rAAV constructs for editing ~71% of bases in protein-coding exons were designed. Similarly, ~81% of genes were predicted to be targetable by rAAV-mediated knock-out. A Gateway-based cloning system for facile generation of rAAV constructs suitable for robotic automation was developed and used in successful generation of targeting constructs. Together, these tools enable automated rAAV targeting construct design, generation as well as enrichment and expansion of targeted cells with desired integrations.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-03-14
    Description: Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-01-10
    Description: Transcription regulation in multicellular eukaryotes is orchestrated by a number of DNA functional elements located at gene regulatory regions. Some regulatory regions (e.g. enhancers) are located far away from the gene they affect. Identification of distal regulatory elements is a challenge for the bioinformatics research. Although existing methodologies increased the number of computationally predicted enhancers, performance inconsistency of computational models across different cell-lines, class imbalance within the learning sets and ad hoc rules for selecting enhancer candidates for supervised learning, are some key questions that require further examination. In this study we developed DEEP, a novel ensemble prediction framework. DEEP integrates three components with diverse characteristics that streamline the analysis of enhancer's properties in a great variety of cellular conditions. In our method we train many individual classification models that we combine to classify DNA regions as enhancers or non-enhancers. DEEP uses features derived from histone modification marks or attributes coming from sequence characteristics. Experimental results indicate that DEEP performs better than four state-of-the-art methods on the ENCODE data. We report the first computational enhancer prediction results on FANTOM5 data where DEEP achieves 90.2% accuracy and 90% geometric mean (GM) of specificity and sensitivity across 36 different tissues. We further present results derived using in vivo -derived enhancer data from VISTA database. DEEP-VISTA, when tested on an independent test set, achieved GM of 80.1% and accuracy of 89.64%. DEEP framework is publicly available at http://cbrc.kaust.edu.sa/deep/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2016-03-19
    Description: The yeast mutant collections are a fundamental tool in deciphering genomic organization and function. Over the last decade, they have been used for the systematic exploration of ~6 000 000 double gene mutants, identifying and cataloging genetic interactions among them. Here we studied the extent to which these data are prone to neighboring gene effects (NGEs), a phenomenon by which the deletion of a gene affects the expression of adjacent genes along the genome. Analyzing ~90,000 negative genetic interactions observed to date, we found that more than 10% of them are incorrectly annotated due to NGEs. We developed a novel algorithm, GINGER, to identify and correct erroneous interaction annotations. We validated the algorithm using a comparative analysis of interactions from Schizosaccharomyces pombe . We further showed that our predictions are significantly more concordant with diverse biological data compared to their mis-annotated counterparts. Our work uncovered about 9500 new genetic interactions in yeast.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2016-03-19
    Description: Transfer RNAs (tRNAs) are essential for encoding the transcribed genetic information from DNA into proteins. Variations in the human tRNAs are involved in diverse clinical phenotypes. Interestingly, all pathogenic variations in tRNAs are located in mitochondrial tRNAs (mt-tRNAs). Therefore, it is crucial to identify pathogenic variations in mt-tRNAs for disease diagnosis and proper treatment. We collected mt-tRNA variations using a classification based on evidence from several sources and used the data to develop a multifactorial probability-based prediction method, PON-mt-tRNA, for classification of mt-tRNA single nucleotide substitutions. We integrated a machine learning-based predictor and an evidence-based likelihood ratio for pathogenicity using evidence of segregation, biochemistry and histochemistry to predict the posterior probability of pathogenicity of variants. The accuracy and Matthews correlation coefficient (MCC) of PON-mt-tRNA are 1.00 and 0.99, respectively. In the absence of evidence from segregation, biochemistry and histochemistry, PON-mt-tRNA classifies variations based on the machine learning method with an accuracy and MCC of 0.69 and 0.39, respectively. We classified all possible single nucleotide substitutions in all human mt-tRNAs using PON-mt-tRNA. The variations in the loops are more often tolerated compared to the variations in stems. The anticodon loop contains comparatively more predicted pathogenic variations than the other loops. PON-mt-tRNA is available at http://structure.bmc.lu.se/PON-mt-tRNA/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2016-05-06
    Description: Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles , a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version ( http://www.bitbucket.org/rfs/alvis ) and its Sequence Bundles visualization module is further available as a web application ( http://science-practice.com/projects/sequence-bundles ).
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2016-10-14
    Description: Functional RNA regions are often related to recurrent secondary structure patterns (or motifs), which can exert their role in several different ways, particularly in dictating the interaction with RNA-binding proteins, and acting in the regulation of a large number of cellular processes. Among the available motif-finding tools, the majority focuses on sequence patterns, sometimes including secondary structure as additional constraints to improve their performance. Nonetheless, secondary structures motifs may be concurrent to their sequence counterparts or even encode a stronger functional signal. Current methods for searching structural motifs generally require long pipelines and/or high computational efforts or previously aligned sequences. Here, we present BEAM (BEAr Motif finder), a novel method for structural motif discovery from a set of unaligned RNAs, taking advantage of a recently developed encoding for RNA secondary structure named BEAR (Brand nEw Alphabet for RNAs) and of evolutionary substitution rates of secondary structure elements. Tested in a varied set of scenarios, from small- to large-scale, BEAM is successful in retrieving structural motifs even in highly noisy data sets, such as those that can arise in CLIP-Seq or other high-throughput experiments.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2016-12-01
    Description: Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish ( Danio rerio ), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet ( www.inetbio.org/danionet ), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2016-12-04
    Description: Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2016-12-17
    Description: A complex disease generally results not from malfunction of individual molecules but from dysfunction of the relevant system or network, which dynamically changes with time and conditions. Thus, estimating a condition-specific network from a single sample is crucial to elucidating the molecular mechanisms of complex diseases at the system level. However, there is currently no effective way to construct such an individual-specific network by expression profiling of a single sample because of the requirement of multiple samples for computing correlations. We developed here with a statistical method, i.e. a sample-specific network (SSN) method, which allows us to construct individual-specific networks based on molecular expressions of a single sample. Using this method, we can characterize various human diseases at a network level. In particular, such SSNs can lead to the identification of individual-specific disease modules as well as driver genes, even without gene sequencing information. Extensive analysis by using the Cancer Genome Atlas data not only demonstrated the effectiveness of the method, but also found new individual-specific driver genes and network patterns for various types of cancer. Biological experiments on drug resistance further validated one important advantage of our method over the traditional methods, i.e. we can even identify such drug resistance genes that actually have no clear differential expression between samples with and without the resistance, due to the additional network information.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2016-12-17
    Description: Motivation: Many biological processes, such as cell cycle, circadian clock, menstrual cycles, are governed by oscillatory systems consisting of numerous components that exhibit rhythmic patterns over time. It is not always easy to identify such rhythmic components. For example, it is a challenging problem to identify circadian genes in a given tissue using time-course gene expression data. There is a great potential for misclassifying non-rhythmic as rhythmic genes and vice versa. This has been a problem of considerable interest in recent years. In this article we develop a constrained inference based methodology called Order Restricted Inference for Oscillatory Systems (ORIOS) to detect rhythmic signals. Instead of using mathematical functions (e.g. sinusoidal) to describe shape of rhythmic signals, ORIOS uses mathematical inequalities. Consequently, it is robust and not limited by the biologist's choice of the mathematical model. We studied the performance of ORIOS using simulated as well as real data obtained from mouse liver, pituitary gland and data from NIH3T3, U2OS cell lines. Our results suggest that, for a broad collection of patterns of gene expression, ORIOS has substantially higher power to detect true rhythmic genes in comparison to some popular methods, while also declaring substantially fewer non-rhythmic genes as rhythmic. Availability and Implementation: A user friendly code implemented in R language can be downloaded from http://www.niehs.nih.gov/research/atniehs/labs/bb/staff/peddada/index.cfm . Contact: peddada@niehs.nih.gov
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2015-11-17
    Description: Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2015-07-25
    Description: The cMonkey integrated biclustering algorithm identifies conditionally co-regulated modules of genes (biclusters). cMonkey integrates various orthogonal pieces of information which support evidence of gene co-regulation, and optimizes biclusters to be supported simultaneously by one or more of these prior constraints. The algorithm served as the cornerstone for constructing the first global, predictive Environmental Gene Regulatory Influence Network (EGRIN) model for a free-living cell, and has now been applied to many more organisms. However, due to its computational inefficiencies, long run-time and complexity of various input data types, cMonkey was not readily usable by the wider community. To address these primary concerns, we have significantly updated the cMonkey algorithm and refactored its implementation, improving its usability and extendibility. These improvements provide a fully functioning and user-friendly platform for building co-regulated gene modules and the tools necessary for their exploration and interpretation. We show, via three separate analyses of data for E. coli, M. tuberculosis and H. sapiens , that the updated algorithm and inclusion of novel scoring functions for new data types (e.g. ChIP-seq and transcription factor over-expression [TFOE]) improve discovery of biologically informative co-regulated modules. The complete cMonkey 2 software package, including source code, is available at https://github.com/baliga-lab/cmonkey2 .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2017-01-10
    Description: RNA molecules are attractive therapeutic targets because non-coding RNA molecules have increasingly been found to play key regulatory roles in the cell. Comparing and classifying RNA 3D structures yields unique insights into RNA evolution and function. With the rapid increase in the number of atomic-resolution RNA structures, it is crucial to have effective tools to classify RNA structures and to investigate them for structural similarities at different resolutions. We previously developed the algorithm CLICK to superimpose a pair of protein 3D structures by clique matching and 3D least squares fitting. In this study, we extend and optimize the CLICK algorithm to superimpose pairs of RNA 3D structures and RNA–protein complexes, independent of the associated topologies. Benchmarking Rclick on four different datasets showed that it is either comparable to or better than other structural alignment methods in terms of the extent of structural overlaps. Rclick also recognizes conformational changes between RNA structures and produces complementary alignments to maximize the extent of detectable similarity. Applying Rclick to study Ribonuclease III protein correctly aligned the RNA binding sites of RNAse III with its substrate. Rclick can be further extended to identify ligand-binding pockets in RNA. A web server is developed at http://mspc.bii.a-star.edu.sg/minhn/rclick.html .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...