ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Books
  • Articles  (101)
  • Computational Methods  (62)
  • Chromatin and Epigenetics  (39)
  • Oxford University Press  (101)
  • Cell Press
  • 1
    Publication Date: 2015-09-19
    Description: Sequence alignment is a long standing problem in bioinformatics. The Basic Local Alignment Search Tool (BLAST) is one of the most popular and fundamental alignment tools. The explosive growth of biological sequences calls for speedup of sequence alignment tools such as BLAST. To this end, we develop high speed BLASTN (HS-BLASTN), a parallel and fast nucleotide database search tool that accelerates MegaBLAST—the default module of NCBI-BLASTN. HS-BLASTN builds a new lookup table using the FMD-index of the database and employs an accurate and effective seeding method to find short stretches of identities (called seeds) between the query and the database. HS-BLASTN produces the same alignment results as MegaBLAST and its computational speed is much faster than MegaBLAST. Specifically, our experiments conducted on a 12-core server show that HS-BLASTN can be 22 times faster than MegaBLAST and exhibits better parallel performance than MegaBLAST. HS-BLASTN is written in C++ and the related source code is available at https://github.com/chenying2016/queries under the GPLv3 license.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2015-05-29
    Description: Model evaluation is a necessary step for better prediction and design of 3D RNA structures. For proteins, this has been widely studied and the knowledge-based statistical potential has been proved to be one of effective ways to solve this problem. Currently, a few knowledge-based statistical potentials have also been proposed to evaluate predicted models of RNA tertiary structures. The benchmark tests showed that they can identify the native structures effectively but further improvements are needed to identify near-native structures and those with non-canonical base pairs. Here, we present a novel knowledge-based potential, 3dRNAscore, which combines distance-dependent and dihedral-dependent energies. The benchmarks on different testing datasets all show that 3dRNAscore are more efficient than existing evaluation methods in recognizing native state from a pool of near-native states of RNAs as well as in ranking near-native states of RNA models.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2016-06-21
    Description: Defining chromatin interaction frequencies and topological domains is a great challenge for the annotations of genome structures. Although the chromosome conformation capture (3C) and its derivative methods have been developed for exploring the global interactome, they are limited by high experimental complexity and costs. Here we describe a novel computational method, called CITD, for de novo prediction of the chromatin interaction map by integrating histone modification data. We used the public epigenomic data from human fibroblast IMR90 cell and embryonic stem cell (H1) to develop and test CITD, which can not only successfully reconstruct the chromatin interaction frequencies discovered by the Hi-C technology, but also provide additional novel details of chromosomal organizations. We predicted the chromatin interaction frequencies, topological domains and their states (e.g. active or repressive) for 98 additional cell types from Roadmap Epigenomics and ENCODE projects. A total of 131 protein-coding genes located near 78 preserved boundaries among 100 cell types are found to be significantly enriched in functional categories of the nucleosome organization and chromatin assembly. CITD and its predicted results can be used for complementing the topological domains derived from limited Hi-C data and facilitating the understanding of spatial principles underlying the chromosomal organization.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2016-06-21
    Description: The goal of pathway analysis is to identify the pathways that are significantly impacted when a biological system is perturbed, e.g. by a disease or drug. Current methods treat pathways as independent entities. However, many signals are constantly sent from one pathway to another, essentially linking all pathways into a global, system-wide complex. In this work, we propose a set of three pathway analysis methods based on the impact analysis, that performs a system-level analysis by considering all signals between pathways, as well as their overlaps. Briefly, the global system is modeled in two ways: (i) considering the inter-pathway interaction exchange for each individual pathways, and (ii) combining all individual pathways to form a global, system-wide graph. The third analysis method is a hybrid of these two models. The new methods were compared with DAVID, GSEA, GSA, PathNet, Crosstalk and SPIA on 23 GEO data sets involving 19 tissues investigated in 12 conditions. The results show that both the ranking and the P -values of the target pathways are substantially improved when the analysis considers the system-wide dependencies and interactions between pathways.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2016-07-09
    Description: Bioinformatic analysis often produces large sets of genomic ranges that can be difficult to interpret in the absence of genomic context. Goldmine annotates genomic ranges from any source with gene model and feature contexts to facilitate global descriptions and candidate loci discovery. We demonstrate the value of genomic context by using Goldmine to elucidate context dynamics in transcription factor binding and to reveal differentially methylated regions (DMRs) with context-specific functional correlations. The open source R package and documentation for Goldmine are available at http://jeffbhasin.github.io/goldmine .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2016-07-09
    Description: Dam identification (DamID) is a powerful technique to generate genome-wide maps of chromatin protein binding. Due to its high sensitivity, it is particularly suited to study the genome interactions of chromatin proteins in small tissue samples in model organisms such as Drosophila . Here, we report an intein-based approach to tune the expression level of Dam and Dam-fusion proteins in Drosophila by addition of a ligand to fly food. This helps to suppress possible toxic effects of Dam. In addition, we describe a strategy for genetically controlled expression of Dam in a specific cell type in complex tissues. We demonstrate the utility of the latter by generating a glia-specific map of Polycomb in small samples of brain tissue. These new DamID tools will be valuable for the mapping of binding patterns of chromatin proteins in Drosophila tissues and especially in cell lineages.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2016-07-28
    Description: CCCTC-binding factor (CTCF) is a multi-functional protein that is assigned various, even contradictory roles in the genome. High-throughput sequencing-based technologies such as ChIP-seq and Hi-C provided us the opportunity to assess the multivalent functions of CTCF in the human genome. The location of CTCF-binding sites with respect to genomic features provides insights into the possible roles of this protein. Here we present the first genome-wide survey and characterization of three important functions of CTCF: enhancer insulator, chromatin barrier and enhancer linker. We developed a novel computational framework to discover the multivalent functions of CTCF based on chromatin state and three-dimensional chromatin architecture. We applied our method to five human cell lines and identified ~46 000 non-redundant CTCF sites related to the three functions. Disparate effects of these functions on gene expression were found and distinct genomic features of these CTCF sites were characterized in GM12878 cells. Finally, we investigated the cell-type specificities of CTCF sites related to these functions across five cell types. Our study provides new insights into the multivalent functions of CTCF in the human genome.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2013-09-26
    Description: Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide only an overview of the aggregate of numerous cells. Computational approaches to de-mix a collective signal composed of the aberrations of a mixed cell population of a tumor sample into its individual components are not available. We propose an evolutionary framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. We have developed an algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation data sets. We applied TrAp to Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of single cells of the same tumor. Finally, we deconvolve sequencing data from eight acute myeloid leukemia patients and three distinct metastases of one melanoma patient to exhibit the evolutionary relationships of their subpopulations.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2013-04-02
    Description: MicroRNAs (miRNAs) constitute an important class of small regulatory RNAs that are derived from distinct hairpin precursors (pre-miRNAs). In contrast to mature miRNAs, which have been characterized in numerous genome-wide studies of different organisms, research on global profiling of pre-miRNAs is limited. Here, using massive parallel sequencing, we have performed global characterization of both mouse mature and precursor miRNAs. In total, 87 369 704 and 252 003 sequencing reads derived from 887 mature and 281 precursor miRNAs were obtained, respectively. Our analysis revealed new aspects of miRNA/pre-miRNA processing and modification, including eight Ago2-cleaved pre-miRNAs, eight new instances of miRNA editing and exclusively 5' tailed mirtrons. Furthermore, based on the sequences of both mature and precursor miRNAs, we developed a miRNA discovery pipeline, miRGrep, which does not rely on the availability of genome reference sequences. In addition to 239 known mouse pre-miRNAs, miRGrep predicted 41 novel ones with high confidence. Similar as known ones, the mature miRNAs derived from most of these novel loci showed both reduced abundance following Dicer knockdown and the binding with Argonaute2. Evaluation on data sets obtained from Caenorhabditis elegans and Caenorhabditis sp.11 demonstrated that miRGrep could be widely used for miRNA discovery in metazoans, especially in those without genome reference sequences.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2013-04-02
    Description: DNA methylation is one of the most important epigenetic alterations involved in the control of gene expression. Bisulfite sequencing of genomic DNA is currently the only method to study DNA methylation patterns at single-nucleotide resolution. Hence, next-generation sequencing of bisulfite-converted DNA is the method of choice to investigate DNA methylation profiles at the genome-wide scale. Nevertheless, whole genome sequencing for analysis of human methylomes is expensive, and a method for targeted gene analysis would provide a good alternative in many cases where the primary interest is restricted to a set of genes. Here, we report the successful use of a custom Agilent SureSelect Target Enrichment system for the hybrid capture of bisulfite-converted DNA. We prepared bisulfite-converted next-generation sequencing libraries, which are enriched for the coding and regulatory regions of 174 ADME genes (i.e. genes involved in the metabolism and distribution of drugs). Sequencing of these libraries on Illumina’s HiSeq2000 revealed that the method allows a reliable quantification of methylation levels of CpG sites in the selected genes, and validation of the method using pyrosequencing and the Illumina 450K methylation BeadChips revealed good concordance.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2013-09-26
    Description: It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense–antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2015-05-03
    Description: MicroRNAs (miRNAs) regulate gene expression by binding to partially complementary sequences on target mRNA transcripts, thereby causing their degradation, deadenylation, or inhibiting their translation. Genomic variants can alter miRNA regulation by modifying miRNA target sites, and multiple human disease phenotypes have been linked to such miRNA target site variants (miR-TSVs). However, systematic genome-wide identification of functional miR-TSVs is difficult due to high false positive rates; functional miRNA recognition sequences can be as short as six nucleotides, with the human genome encoding thousands of miRNAs. Furthermore, while large-scale clinical genomic data sets are becoming increasingly commonplace, existing miR-TSV prediction methods are not designed to analyze these data. Here, we present an open-source tool called SubmiRine that is designed to perform efficient miR-TSV prediction systematically on variants identified in novel clinical genomic data sets. Most importantly, SubmiRine allows for the prioritization of predicted miR-TSVs according to their relative probability of being functional. We present the results of SubmiRine using integrated clinical genomic data from a large-scale cohort study on chronic obstructive pulmonary disease (COPD), making a number of high-scoring, novel miR-TSV predictions. We also demonstrate SubmiRine's ability to predict and prioritize known miR-TSVs that have undergone experimental validation in previous studies.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2014-12-17
    Description: Combinatorial transcription factor (TF) binding is essential for cell-type-specific gene regulation. However, much remains to be learned about the mechanisms of TF interactions, including to what extent constrained spacing and orientation of interacting TFs are critical for regulatory element activity. To examine the relative prevalence of the ‘enhanceosome’ versus the ‘TF collective’ model of combinatorial TF binding, a comprehensive analysis of TF binding site sequences in large scale datasets is necessary. We developed a motif-pair discovery pipeline to identify motif co-occurrences with preferential distance(s) between motifs in TF-bound regions. Utilizing a compendium of 289 mouse haematopoietic TF ChIP-seq datasets, we demonstrate that haematopoietic-related motif-pairs commonly occur with highly conserved constrained spacing and orientation between motifs. Furthermore, motif clustering revealed specific associations for both heterotypic and homotypic motif-pairs with particular haematopoietic cell types. We also showed that disrupting the spacing between motif-pairs significantly affects transcriptional activity in a well-known motif-pair—E-box and GATA, and in two previously unknown motif-pairs with constrained spacing—Ets and Homeobox as well as Ets and E-box. In this study, we provide evidence for widespread sequence-specific TF pair interaction with DNA that conforms to the ‘enhanceosome’ model, and furthermore identify associations between specific haematopoietic cell-types and motif-pairs.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2015-01-24
    Description: Of the ~1.3 million Alu elements in the human genome, only a tiny number are estimated to be active in transcription by RNA polymerase (Pol) III. Tracing the individual loci from which Alu transcripts originate is complicated by their highly repetitive nature. By exploiting RNA-Seq data sets and unique Alu DNA sequences, we devised a bioinformatic pipeline allowing us to identify Pol III-dependent transcripts of individual Alu elements. When applied to ENCODE transcriptomes of seven human cell lines, this search strategy identified ~1300 Alu loci corresponding to detectable transcripts, with ~120 of them expressed in at least three cell lines. In vitro transcription of selected Alu s did not reflect their in vivo expression properties, and required the native 5'-flanking region in addition to internal promoter. We also identified a cluster of expressed Alu Ya5-derived transcription units, juxtaposed to snaR genes on chromosome 19, formed by a promoter-containing left monomer fused to an Alu -unrelated downstream moiety. Autonomous Pol III transcription was also revealed for Alu s nested within Pol II-transcribed genes. The ability to investigate Alu transcriptomes at single-locus resolution will facilitate both the identification of novel biologically relevant Alu RNAs and the assessment of Alu expression alteration under pathological conditions.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2014-12-17
    Description: Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2016-04-08
    Description: Existing methods for interpreting protein variation focus on annotating mutation pathogenicity rather than detailed interpretation of variant deleteriousness and frequently use only sequence-based or structure-based information. We present VIPUR, a computational framework that seamlessly integrates sequence analysis and structural modelling (using the Rosetta protein modelling suite) to identify and interpret deleterious protein variants. To train VIPUR, we collected 9477 protein variants with known effects on protein function from multiple organisms and curated structural models for each variant from crystal structures and homology models. VIPUR can be applied to mutations in any organism's proteome with improved generalized accuracy (AUROC .83) and interpretability (AUPR .87) compared to other methods. We demonstrate that VIPUR's predictions of deleteriousness match the biological phenotypes in ClinVar and provide a clear ranking of prediction confidence. We use VIPUR to interpret known mutations associated with inflammation and diabetes, demonstrating the structural diversity of disrupted functional sites and improved interpretation of mutations associated with human diseases. Lastly, we demonstrate VIPUR's ability to highlight candidate variants associated with human diseases by applying VIPUR to de novo variants associated with autism spectrum disorders.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2015-10-15
    Description: Intrinsically disordered proteins and regions (IDPs and IDRs) lack stable 3D structure under physiological conditions in-vitro , are common in eukaryotes, and facilitate interactions with RNA, DNA and proteins. Current methods for prediction of IDPs and IDRs do not provide insights into their functions, except for a handful of methods that address predictions of protein-binding regions. We report first-of-its-kind computational method DisoRDPbind for high-throughput prediction of RNA, DNA and protein binding residues located in IDRs from protein sequences. DisoRDPbind is implemented using a runtime-efficient multi-layered design that utilizes information extracted from physiochemical properties of amino acids, sequence complexity, putative secondary structure and disorder and sequence alignment. Empirical tests demonstrate that it provides accurate predictions that are competitive with other predictors of disorder-mediated protein binding regions and complementary to the methods that predict RNA- and DNA-binding residues annotated based on crystal structures. Application in Homo sapiens, Mus musculus, Caenorhabditis elegans and Drosophila melanogaster proteomes reveals that RNA- and DNA-binding proteins predicted by DisoRDPbind complement and overlap with the corresponding known binding proteins collected from several sources. Also, the number of the putative protein-binding regions predicted with DisoRDPbind correlates with the promiscuity of proteins in the corresponding protein–protein interaction networks. Webserver: http://biomine.ece.ualberta.ca/DisoRDPbind/
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2015-12-16
    Description: Many cancers comprise heterogeneous populations of cells at primary and metastatic sites throughout the body. The presence or emergence of distinct subclones with drug-resistant genetic and epigenetic phenotypes within these populations can greatly complicate therapeutic intervention. Liquid biopsies of peripheral blood from cancer patients have been suggested as an ideal means of sampling intratumor genetic and epigenetic heterogeneity for diagnostics, monitoring and therapeutic guidance. However, current molecular diagnostic and sequencing methods are not well suited to the routine assessment of epigenetic heterogeneity in difficult samples such as liquid biopsies that contain intrinsically low fractional concentrations of circulating tumor DNA (ctDNA) and rare epigenetic subclonal populations. Here we report an alternative approach, deemed DREAMing (Discrimination of Rare EpiAlleles by Melt), which uses semi-limiting dilution and precise melt curve analysis to distinguish and enumerate individual copies of epiallelic species at single-CpG-site resolution in fractions as low as 0.005%, providing facile and inexpensive ultrasensitive assessment of locus-specific epigenetic heterogeneity directly from liquid biopsies. The technique is demonstrated here for the evaluation of epigenetic heterogeneity at p14 ARF and BRCA1 gene-promoter loci in liquid biopsies obtained from patients in association with non-small cell lung cancer (NSCLC) and myelodysplastic/myeloproliferative neoplasms (MDS/MPN), respectively.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2015-12-16
    Description: Bisulfite sequencing is a key methodology in epigenetics. However, the standard workflow of bisulfite sequencing involves heat and strongly basic conditions to convert the intermediary product 5,6-dihydrouridine-6-sulfonate (dhU6S) (generated by reaction of bisulfite with deoxycytidine (dC)) to uracil (dU). These harsh conditions generally lead to sample loss and DNA damage while milder conditions may result in incomplete conversion of intermediates to uracil. Both can lead to poor recovery of bisulfite-treated DNA by the polymerase chain reaction (PCR) as either damaged DNA and/or intermediates of bisulfite treatment are poor substrate for standard DNA polymerases. Here we describe an engineered DNA polymerase (5D4) with an enhanced ability to replicate and PCR amplify bisulfite-treated DNA due to an ability to bypass both DNA lesions and bisulfite intermediates, allowing significantly milder conversion conditions and increased sensitivity in the PCR amplification of bisulfite-treated DNA. Incorporation of the 5D4 DNA polymerase into the bisulfite sequencing workflow thus promises significant sensitivity and efficiency gains.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2016-06-03
    Description: The sequential chain of interactions altering the binary state of a biomolecule represents the ‘information flow’ within a cellular network that determines phenotypic properties. Given the lack of computational tools to dissect context-dependent networks and gene activities, we developed NetDecoder, a network biology platform that models context-dependent information flows using pairwise phenotypic comparative analyses of protein–protein interactions. Using breast cancer, dyslipidemia and Alzheimer's disease as case studies, we demonstrate NetDecoder dissects subnetworks to identify key players significantly impacting cell behaviour specific to a given disease context. We further show genes residing in disease-specific subnetworks are enriched in disease-related signalling pathways and information flow profiles, which drive the resulting disease phenotypes. We also devise a novel scoring scheme to quantify key genes—network routers, which influence many genes, key targets, which are influenced by many genes, and high impact genes, which experience a significant change in regulation. We show the robustness of our results against parameter changes. Our network biology platform includes freely available source code ( http://www.NetDecoder.org ) for researchers to explore genome-wide context-dependent information flow profiles and key genes, given a set of genes of particular interest and transcriptome data. More importantly, NetDecoder will enable researchers to uncover context-dependent drug targets.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2016-09-03
    Description: Nucleosomes, the fundamental subunits of eukaryotic chromatin, are organized with respect to transcriptional start sites. A major challenge to the persistence of this organization is the disassembly of nucleosomes during DNA replication. Here, we use complimentary approaches to map the locations of nucleosomes on recently replicated DNA. We find that nucleosomes are substantially realigned with promoters during the minutes following DNA replication. As a result, the nucleosomal landscape is largely re-established before newly replicated chromosomes are partitioned into daughter cells and can serve as a platform for the re-establishment of gene expression programmes. When the supply of histones is disrupted through mutation of the chaperone Caf1, a promoter-based architecture is generated, but with increased inter-nucleosomal spacing. This indicates that the chromatin remodelling enzymes responsible for spacing nucleosomes are capable of organizing nucleosomes with a range of different linker DNA lengths.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2016-09-20
    Description: DNA methylation plays an important role in many biological processes. Existing epigenome-wide association studies (EWAS) have successfully identified aberrantly methylated genes in many diseases and disorders with most studies focusing on analysing methylation sites one at a time. Incorporating prior biological information such as biological networks has been proven to be powerful in identifying disease-associated genes in both gene expression studies and genome-wide association studies (GWAS) but has been under studied in EWAS. Although recent studies have noticed that there are differences in methylation variation in different groups, only a few existing methods consider variance signals in DNA methylation studies. Here, we present a network-assisted algorithm, NEpiC, that combines both mean and variance signals in searching for differentially methylated sub-networks using the protein–protein interaction (PPI) network. In simulation studies, we demonstrate the power gain from using both the prior biological information and variance signals compared to using either of the two or neither information. Applications to several DNA methylation datasets from the Cancer Genome Atlas (TCGA) project and DNA methylation data on hepatocellular carcinoma (HCC) from the Columbia University Medical Center (CUMC) suggest that the proposed NEpiC algorithm identifies more cancer-related genes and generates better replication results.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2016-08-20
    Description: To improve the epigenomic analysis of tissues rich in 5-hydroxymethylcytosine (hmC), we developed a novel protocol called TAB-Methyl-SEQ, which allows for single base resolution profiling of both hmC and 5-methylcytosine by targeted next-generation sequencing. TAB-Methyl-SEQ data were extensively validated by a set of five methodologically different protocols. Importantly, these extensive cross-comparisons revealed that protocols based on Tet1-assisted bisulfite conversion provided more precise hmC values than TrueMethyl-based methods. A total of 109 454 CpG sites were analyzed by TAB-Methyl-SEQ for mC and hmC in 188 genes from 20 different adult human livers. We describe three types of variability of hepatic hmC profiles: (i) sample-specific variability at 40.8% of CpG sites analyzed, where the local hmC values correlate to the global hmC content of livers (measured by LC-MS), (ii) gene-specific variability, where hmC levels in the coding regions positively correlate to expression of the respective gene and (iii) site-specific variability, where prominent hmC peaks span only 1 to 3 neighboring CpG sites. Our data suggest that both the gene- and site-specific components of hmC variability might contribute to the epigenetic control of hepatic genes. The protocol described here should be useful for targeted DNA analysis in a variety of applications.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-04-21
    Description: limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously described.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-02-18
    Description: Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of metagenomic protein predictions adversely impacts homology detection and, therefore, the quality of the overall annotation of the dataset. Here we present a novel algorithm called GRASP that accurately identifies the homologs of a given reference protein sequence from a database consisting of partial-length metagenomic proteins. Our homology detection strategy is guided by the reference sequence, and involves the simultaneous search and assembly of overlapping database sequences. GRASP was compared to three commonly used protein sequence search programs (BLASTP, PSI-BLAST and FASTM). Our evaluations using several simulated and real datasets show that GRASP has a significantly higher sensitivity than these programs while maintaining a very high specificity. GRASP can be a very useful program for detecting and quantifying taxonomic and protein family abundances in metagenomic datasets. GRASP is implemented in GNU C++, and is freely available at http://sourceforge.net/projects/grasp-release .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2015-02-18
    Description: Genetic screens of an unprecedented scale have recently been made possible by the availability of high-complexity libraries of synthetic oligonucleotides designed to mediate either gene knockdown or gene knockout, coupled with next-generation sequencing. However, several sources of random noise and statistical biases complicate the interpretation of the resulting high-throughput data. We developed HiTSelect, a comprehensive analysis pipeline for rigorously selecting screen hits and identifying functionally relevant genes and pathways by addressing off-target effects, controlling for variance in both gene silencing efficiency and sequencing depth of coverage and integrating relevant metadata. We document the superior performance of HiTSelect using data from both genome-wide RNAi and CRISPR/Cas9 screens. HiTSelect is implemented as an open-source package, with a user-friendly interface for data visualization and pathway exploration. Binary executables are available at http://sourceforge.net/projects/hitselect/ , and the source code is available at https://github.com/diazlab/HiTSelect .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2015-01-24
    Description: Integrative analyses of epigenetic data promise a deeper understanding of the epigenome. Epidaurus is a bioinformatics tool used to effectively reveal inter-dataset relevance and differences through data aggregation, integration and visualization. In this study, we demonstrated the utility of Epidaurus in validating hypotheses and generating novel biological insights. In particular, we described the use of Epidaurus to (i) integrate epigenetic data from prostate cancer cell lines to validate the activation function of EZH2 in castration-resistant prostate cancer and to (ii) study the mechanism of androgen receptor ( AR ) binding deregulation induced by the knockdown of FOXA1 . We found that EZH2 's noncanonical activation function was reaffirmed by its association with active histone markers and the lack of association with repressive markers. More importantly, we revealed that the binding of AR was selectively reprogramed to promoter regions, leading to the up-regulation of hundreds of cancer-associated genes including EGFR . The prebuilt epigenetic dataset from commonly used cell lines (LNCaP, VCaP, LNCaP-Abl, MCF7, GM12878, K562, HeLa-S3, A549, HePG2) makes Epidaurus a useful online resource for epigenetic research. As standalone software, Epidaurus is specifically designed to process user customized datasets with both efficiency and convenience.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2015-02-18
    Description: The large number of chemical modifications that are found on the histone proteins of eukaryotic cells form multiple complex combinations, which can act as recognition signals for reader proteins. We have used peptide capture in conjunction with super-SILAC quantification to carry out an unbiased high-throughput analysis of the composition of protein complexes that bind to histone H3K9/S10 and H3K27/S28 methyl-phospho modifications. The accurate quantification allowed us to perform Weighted correlation network analysis (WGCNA) to obtain a systems-level view of the histone H3 histone tail interactome. The analysis reveals the underlying modularity of the histone reader network with members of nuclear complexes exhibiting very similar binding signatures, which suggests that many proteins bind to histones as part of pre-organized complexes. Our results identify a novel complex that binds to the double H3K9me3/S10ph modification, which includes Atrx, Daxx and members of the FACT complex. The super-SILAC approach allows comparison of binding to multiple peptides with different combinations of modifications and the resolution of the WGCNA analysis is enhanced by maximizing the number of combinations that are compared. This makes it a useful approach for assessing the effects of changes in histone modification combinations on the composition and function of bound complexes.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2015-02-18
    Description: RNA-protein complexes are essential in mediating important fundamental cellular processes, such as transport and localization. In particular, ncRNA-protein interactions play an important role in post-transcriptional gene regulation like mRNA localization, mRNA stabilization, poly-adenylation, splicing and translation. The experimental methods to solve RNA-protein interaction prediction problem remain expensive and time-consuming. Here, we present the RPI-Pred (RNA-protein interaction predictor), a new support-vector machine-based method, to predict protein-RNA interaction pairs, based on both the sequences and structures. The results show that RPI-Pred can correctly predict RNA-protein interaction pairs with ~94% prediction accuracy when using sequence and experimentally determined protein and RNA structures, and with ~83% when using sequences and predicted protein and RNA structures. Further, our proposed method RPI-Pred was superior to other existing ones by predicting more experimentally validated ncRNA-protein interaction pairs from different organisms. Motivated by the improved performance of RPI-Pred, we further applied our method for reliable construction of ncRNA-protein interaction networks. The RPI-Pred is publicly available at: http://ctsb.is.wfubmc.edu/projects/rpi-pred .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2015-02-18
    Description: Identifying conserved and divergent response patterns in gene networks is becoming increasingly important. A common approach is integrating expression information with gene association networks in order to find groups of connected genes that are activated or repressed. In many cases, researchers are also interested in comparisons across species (or conditions). Finding an active sub-network is a hard problem and applying it across species requires further considerations (e.g. orthology information, expression data and networks from different sources). To address these challenges we devised ModuleBlast, which uses both expression and network topology to search for highly relevant sub-networks. We have applied ModuleBlast to expression and interaction data from mouse, macaque and human to study immune response and aging. The immune response analysis identified several relevant modules, consistent with recent findings on apoptosis and NFB activation following infection. Temporal analysis of these data revealed cascades of modules that are dynamically activated within and across species. We have experimentally validated some of the novel hypotheses resulting from the analysis of the ModuleBlast results leading to new insights into the mechanisms used by a key mammalian aging protein.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2015-02-18
    Description: Here we used discriminative training methods to uncover the chromatin, transcription factor (TF) binding and sequence features of enhancers underlying gene expression in individual cardiac cells. We used machine learning with TF motifs and ChIP data for a core set of cardiogenic TFs and histone modifications to classify Drosophila cell-type-specific cardiac enhancer activity. We show that the classifier models can be used to predict cardiac cell subtype cis -regulatory activities. Associating the predicted enhancers with an expression atlas of cardiac genes further uncovered clusters of genes with transcription and function limited to individual cardiac cell subtypes. Further, the cell-specific enhancer models revealed chromatin, TF binding and sequence features that distinguish enhancer activities in distinct subsets of heart cells. Collectively, our results show that computational modeling combined with empirical testing provides a powerful platform to uncover the enhancers, TF motifs and gene expression profiles which characterize individual cardiac cell fates.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2014-11-28
    Description: Genome-wide assessment of protein–DNA interaction by chromatin immunoprecipitation followed by massive parallel sequencing (ChIP-seq) is a key technology for studying transcription factor (TF) localization and regulation of gene expression. Signal-to-noise-ratio and signal specificity in ChIP-seq studies depend on many variables, including antibody affinity and specificity. Thus far, efforts to improve antibody reagents for ChIP-seq experiments have focused mainly on generating higher quality antibodies. Here we introduce KOIN (knockout implemented normalization) as a novel strategy to increase signal specificity and reduce noise by using TF knockout mice as a critical control for ChIP-seq data experiments. Additionally, KOIN can identify ‘hyper ChIPable regions’ as another source of false-positive signals. As the use of the KOIN algorithm reduces false-positive results and thereby prevents misinterpretation of ChIP-seq data, it should be considered as the gold standard for future ChIP-seq analyses, particularly when developing ChIP-assays with novel antibody reagents.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2015-07-12
    Description: We present a capture-based approach for bisulfite-converted DNA that allows interrogation of pre-defined genomic locations, allowing quantitative and qualitative assessments of 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC) at CG dinucleotides and in non-CG contexts (CHG, CHH) in mammalian and plant genomes. We show the technique works robustly and reproducibly using as little as 500 ng of starting DNA, with results correlating well with whole genome bisulfite sequencing data, and demonstrate that human DNA can be tested in samples contaminated with microbial DNA. This targeting approach will allow cell type-specific designs to maximize the value of 5mC and 5hmC sequencing.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2015-07-12
    Description: Androgen receptor (AR) variants (AR-Vs) expressed in prostate cancer (PCa) lack the AR ligand binding domain (LBD) and function as constitutively active transcription factors. AR-V expression in patient tissues or circulating tumor cells is associated with resistance to AR-targeting endocrine therapies and poor outcomes. Here, we investigated the mechanisms governing chromatin binding of AR-Vs with the goal of identifying therapeutic vulnerabilities. By chromatin immunoprecipitation and sequencing (ChIP-seq) and complementary biochemical experiments, we show that AR-Vs display a binding preference for the same canonical high-affinity androgen response elements (AREs) that are preferentially engaged by AR, albeit with lower affinity. Dimerization was an absolute requirement for constitutive AR-V DNA binding and transcriptional activation. Treatment with the bromodomain and extraterminal (BET) inhibitor JQ1 resulted in inhibition of AR-V chromatin binding and impaired AR-V driven PCa cell growth in vitro and in vivo . Importantly, this was associated with a novel JQ1 action of down-regulating AR-V transcript and protein expression. Overall, this study demonstrates that AR-Vs broadly restore AR chromatin binding events that are otherwise suppressed during endocrine therapy, and provides pre-clinical rationale for BET inhibition as a strategy for inhibiting expression and chromatin binding of AR-Vs in PCa.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2015-09-30
    Description: A key aspect of RNA secondary structure prediction is the identification of novel functional elements. This is a challenging task because these elements typically are embedded in longer transcripts where the borders between the element and flanking regions have to be defined. The flanking sequences impact the folding of the functional elements both at the level of computational analyses and when the element is extracted as a transcript for experimental analysis. Here, we analyze how different flanking region lengths impact folding into a constrained structure by computing probabilities of folding for different sizes of flanking regions. Our method, RNAcop (RNA context optimization by probability), is tested on known and de novo predicted structures. In vitro experiments support the computational analysis and suggest that for a number of structures, choosing proper lengths of flanking regions is critical. RNAcop is available as web server and stand-alone software via http://rth.dk/resources/rnacop .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2016-02-20
    Description: Nucleosomal DNA is thought to be generally inaccessible to DNA-binding factors, such as micrococcal nuclease (MNase). Here, we digest Drosophila chromatin with high and low concentrations of MNase to reveal two distinct nucleosome types: MNase-sensitive and MNase-resistant. MNase-resistant nucleosomes assemble on sequences depleted of A/T and enriched in G/C-containing dinucleotides, whereas MNase-sensitive nucleosomes form on A/T-rich sequences found at transcription start and termination sites, enhancers and DNase I hypersensitive sites. Estimates of nucleosome formation energies indicate that MNase-sensitive nucleosomes tend to be less stable than MNase-resistant ones. Strikingly, a decrease in cell growth temperature of about 10°C makes MNase-sensitive nucleosomes less accessible, suggesting that observed variations in MNase sensitivity are related to either thermal fluctuations of chromatin fibers or the activity of enzymatic machinery. In the vicinity of active genes and DNase I hypersensitive sites nucleosomes are organized into periodic arrays, likely due to ‘phasing’ off potential barriers formed by DNA-bound factors or by nucleosomes anchored to their positions through external interactions. The latter idea is substantiated by our biophysical model of nucleosome positioning and energetics, which predicts that nucleosomes immediately downstream of transcription start sites are anchored and recapitulates nucleosome phasing at active genes significantly better than sequence-dependent models.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2016-03-01
    Description: It is being increasingly realized that nucleosome organization on DNA crucially regulates DNA–protein interactions and the resulting gene expression. While the spatial character of the nucleosome positioning on DNA has been experimentally and theoretically studied extensively, the temporal character is poorly understood. Accounting for ATPase activity and DNA-sequence effects on nucleosome kinetics, we develop a theoretical method to estimate the time of continuous exposure of binding sites of non-histone proteins (e.g. transcription factors and TATA binding proteins) along any genome. Applying the method to Saccharomyces cerevisiae , we show that the exposure timescales are determined by cooperative dynamics of multiple nucleosomes, and their behavior is often different from expectations based on static nucleosome occupancy. Examining exposure times in the promoters of GAL1 and PHO5, we show that our theoretical predictions are consistent with known experiments. We apply our method genome-wide and discover huge gene-to-gene variability of mean exposure times of TATA boxes and patches adjacent to TSS (+1 nucleosome region); the resulting timescale distributions have non-exponential tails.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2016-02-20
    Description: The Illumina HumanMethylation450 BeadChip is increasingly utilized in epigenome-wide association studies, however, this array-based measurement of DNA methylation is subject to measurement variation. Appropriate data preprocessing to remove background noise is important for detecting the small changes that may be associated with disease. We developed a novel background correction method, ENmix, that uses a mixture of exponential and truncated normal distributions to flexibly model signal intensity and uses a truncated normal distribution to model background noise. Depending on data availability, we employ three approaches to estimate background normal distribution parameters using (i) internal chip negative controls, (ii) out-of-band Infinium I probe intensities or (iii) combined methylated and unmethylated intensities. We evaluate ENmix against other available methods for both reproducibility among duplicate samples and accuracy of methylation measurement among laboratory control samples. ENmix out-performed other background correction methods for both these measures and substantially reduced the probe-design type bias between Infinium I and II probes. In reanalysis of existing EWAS data we show that ENmix can identify additional CpGs, and results in smaller P -value estimates for previously-validated CpGs. We incorporated the method into R package ENmix , which is freely available from Bioconductor website.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2015-10-31
    Description: Systems biologists aim to decipher the structure and dynamics of signaling and regulatory networks underpinning cellular responses; synthetic biologists can use this insight to alter existing networks or engineer de novo ones. Both tasks will benefit from an understanding of which structural and dynamic features of networks can emerge from evolutionary processes, through which intermediary steps these arise, and whether they embody general design principles. As natural evolution at the level of network dynamics is difficult to study, in silico evolution of network models can provide important insights. However, current tools used for in silico evolution of network dynamics are limited to ad hoc computer simulations and models. Here we introduce BioJazz, an extendable, user-friendly tool for simulating the evolution of dynamic biochemical networks. Unlike previous tools for in silico evolution, BioJazz allows for the evolution of cellular networks with unbounded complexity by combining rule-based modeling with an encoding of networks that is akin to a genome. We show that BioJazz can be used to implement biologically realistic selective pressures and allows exploration of the space of network architectures and dynamics that implement prescribed physiological functions. BioJazz is provided as an open-source tool to facilitate its further development and use. Source code and user manuals are available at: http://oss-lab.github.io/biojazz and http://osslab.lifesci.warwick.ac.uk/BioJazz.aspx .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2015-12-02
    Description: Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information alone.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2015-12-02
    Description: DNA methylation is an important epigenetic modification involved in many biological processes and diseases. Recent developments in whole genome bisulfite sequencing (WGBS) technology have enabled genome-wide measurements of DNA methylation at single base pair resolution. Many experiments have been conducted to compare DNA methylation profiles under different biological contexts, with the goal of identifying differentially methylated regions (DMRs). Due to the high cost of WGBS experiments, many studies are still conducted without biological replicates. Methods and tools available for analyzing such data are very limited. We develop a statistical method, DSS-single, for detecting DMRs from WGBS data without replicates. We characterize the count data using a rigorous model that accounts for the spatial correlation of methylation levels, sequence depth and biological variation. We demonstrate that using information from neighboring CG sites, biological variation can be estimated accurately even without replicates. DMR detection is then carried out via a Wald test procedure. Simulations demonstrate that DSS-single has greater sensitivity and accuracy than existing methods, and an analysis of H1 versus IMR90 cell lines suggests that it also yields the most biologically meaningful results. DSS-single is implemented in the Bioconductor package DSS.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2015-12-02
    Description: Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2012-07-22
    Description: Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2012-07-22
    Description: Cataloging the association of transcripts to genetic variants in recent years holds the promise for functional dissection of regulatory structure of human transcription. Here, we present a novel approach, which aims at elucidating the joint relationships between transcripts and single-nucleotide polymorphisms (SNPs). This entails detection and analysis of modules of transcripts, each weakly associated to a single genetic variant, together exposing a high-confidence association signal between the module and this ‘main’ SNP. To explore how transcripts in a module are related to causative loci for that module, we represent such dependencies by a graphical model. We applied our method to the existing data on genetics of gene expression in the liver. The modules are significantly more, larger and denser than found in permuted data. Quantification of the confidence in a module as a likelihood score, allows us to detect transcripts that do not reach genome-wide significance level. Topological analysis of each module identifies novel insights regarding the flow of causality between the main SNP and transcripts. We observe similar annotations of modules from two sources of information: the enrichment of a module in gene subsets and locus annotation of the genetic variants. This and further phenotypic analysis provide a validation for our methodology.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2012-07-22
    Description: Phase variation of surface structures occurs in diverse bacterial species due to stochastic, high frequency, reversible mutations. Multiple genes of Campylobacter jejuni are subject to phase variable gene expression due to mutations in polyC/G tracts. A modal length of nine repeats was detected for polyC/G tracts within C. jejuni genomes. Switching rates for these tracts were measured using chromosomally-located reporter constructs and high rates were observed for cj1139 (G8) and cj0031 (G9). Alteration of the cj1139 tract from G8 to G11 increased mutability 10-fold and changed the mutational pattern from predominantly insertions to mainly deletions. Using a multiplex PCR, major changes were detected in ‘on/off’ status for some phase variable genes during passage of C. jejuni in chickens. Utilization of observed switching rates in a stochastic, theoretical model of phase variation demonstrated links between mutability and genetic diversity but could not replicate observed population diversity. We propose that modal repeat numbers have evolved in C. jejuni genomes due to molecular drivers associated with the mutational patterns of these polyC/G repeats, rather than by selection for particular switching rates, and that factors other than mutational drift are responsible for generating genetic diversity during host colonization by this bacterial pathogen.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2012-09-13
    Description: Control of translation in eukaryotes is complex, depending on the binding of various factors to mRNAs. Available data for subsets of mRNAs that are translationally up- and down-regulated in yeast eIF4E-binding protein (4E-BP) deletion mutants are coupled with reported mRNA secondary structure measurements to investigate whether 5'-UTR secondary structure varies between the subsets. Genes with up-regulated translational efficiencies in the caf20 mutant have relatively high averaged 5'-UTR secondary structure. There is no apparent wide-scale correlation of RNA-binding protein preferences with the increased 5'-UTR secondary structure, leading us to speculate that the secondary structure itself may play a role in differential partitioning of mRNAs between eIF4E/4E-BP repression and eIF4E/eIF4G translation initiation. Both Caf20p and Eap1p contain stretches of positive charge in regions of predicted disorder. Such regions are also present in eIF4G and have been reported to associate with mRNA binding. The pattern of these segments, around the canonical eIF4E-binding motif, varies between each 4E-BP and eIF4G. Analysis of gene ontology shows that yeast proteins containing predicted disordered segments, with positive charge runs, are enriched for nucleic acid binding. We propose that the 4E-BPs act, in part, as differential, flexible, polyelectrostatic scaffolds for mRNAs.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2012-05-23
    Description: The activation of cryptic 5' splice sites (5' SSs) is often related to human hereditary diseases. The DNA-based mutation screening strategies are commonly used to recognize the cryptic 5' SSs, because features of the local DNA sequence can influence the choice of cryptic 5' SSs. To improve the identification of the cryptic 5' SSs, we developed a structure-based method, named SPO (structure profiles and odds measure), which combines two parameters, the structural feature derived from hydroxyl radical cleavage pattern and odds measure, to assess the likelihood of a cryptic 5' SS activation in competing with its paired authentic 5' SS. Compared to the current tools for identifying activated cryptic 5' SSs, the SPO algorithm achieves higher prediction accuracy than the other methods, including MaxEnt, MDD, Markov model, weight matrix model, Shapiro and Senapathy matrix, R i and G . In addition, the predicted SPO scores from the SPO algorithm exhibited a greater degree of correlation with the strength of cryptic 5' SS activation than that measured from the other seven methods. In conclusion, the SPO algorithm provides an optimal identification of cryptic 5' SSs, can be applied in designing mutagenesis experiments for various splicing events and may be helpful to investigate the relationship between structural variants and human hereditary diseases.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2013-12-07
    Description: The epigenetic modification of 5-hydroxymethylcytosine (5hmC) is receiving great attention due to its potential role in DNA methylation reprogramming and as a cell state identifier. Given this interest, it is important to identify reliable and cost-effective methods for the enrichment of 5hmC marked DNA for downstream analysis. We tested three commonly used affinity-based enrichment techniques; (i) antibody, (ii) chemical capture and (iii) protein affinity enrichment and assessed their ability to accurately and reproducibly report 5hmC profiles in mouse tissues containing high (brain) and lower (liver) levels of 5hmC. The protein-affinity technique is a poor reporter of 5hmC profiles, delivering 5hmC patterns that are incompatible with other methods. Both antibody and chemical capture-based techniques generate highly similar genome-wide patterns for 5hmC, which are independently validated by standard quantitative PCR (qPCR) and glucosyl-sensitive restriction enzyme digestion (gRES-qPCR). Both antibody and chemical capture generated profiles reproducibly link to unique chromatin modification profiles associated with 5hmC. However, there appears to be a slight bias of the antibody to bind to regions of DNA rich in simple repeats. Ultimately, the increased specificity observed with chemical capture-based approaches makes this an attractive method for the analysis of locus-specific or genome-wide patterns of 5hmC.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2013-10-19
    Description: Methylation-specific fluorescence in situ hybridization (MeFISH) was developed for microscopic visualization of DNA methylation status at specific repeat sequences in individual cells. MeFISH is based on the differential reactivity of 5-methylcytosine and cytosine in target DNA for interstrand complex formation with osmium and bipyridine-containing nucleic acids (ICON). Cell nuclei and chromosomes hybridized with fluorescence-labeled ICON probes for mouse major and minor satellite repeats were treated with osmium for crosslinking. After denaturation, fluorescent signals were retained specifically at satellite repeats in wild-type, but not in DNA methyltransferase triple-knockout (negative control) mouse embryonic stem cells. Moreover, using MeFISH, we successfully detected hypomethylated satellite repeats in cells from patients with immunodeficiency, centromeric instability and facial anomalies syndrome and 5-hydroxymethylated satellite repeats in male germ cells, the latter of which had been considered to be unmethylated based on anti-5-methylcytosine antibody staining. MeFISH will be suitable for a wide range of applications in epigenetics research and medical diagnosis.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2014-05-01
    Description: DNA methylation is an important epigenetic modification that has essential roles in cellular processes including gene regulation, development and disease and is widely dysregulated in most types of cancer. Recent advances in sequencing technology have enabled the measurement of DNA methylation at single nucleotide resolution through methods such as whole-genome bisulfite sequencing and reduced representation bisulfite sequencing. In DNA methylation studies, a key task is to identify differences under distinct biological contexts, for example, between tumor and normal tissue. A challenge in sequencing studies is that the number of biological replicates is often limited by the costs of sequencing. The small number of replicates leads to unstable variance estimation, which can reduce accuracy to detect differentially methylated loci (DML). Here we propose a novel statistical method to detect DML when comparing two treatment groups. The sequencing counts are described by a lognormal-beta-binomial hierarchical model, which provides a basis for information sharing across different CpG sites. A Wald test is developed for hypothesis testing at each CpG site. Simulation results show that the proposed method yields improved DML detection compared to existing methods, particularly when the number of replicates is low. The proposed method is implemented in the Bioconductor package DSS.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2014-05-01
    Description: Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it employs all genes present in an unknown sequence as classifiers, weighting each gene based on its (predetermined) classifying power at a given taxonomic level and frequency of horizontal gene transfer. MyTaxa also implements a novel classification scheme based on the genome-aggregate average amino acid identity concept to determine the degree of novelty of sequences representing uncharacterized taxa, i.e. whether they represent novel species, genera or phyla. Application of MyTaxa on in silico generated (mock) and real metagenomes of varied read length (100–2000 bp) revealed that it correctly classified at least 5% more sequences than any other tool. The analysis also showed that ~10% of the assembled sequences from human gut metagenomes represent novel species with no sequenced representatives, several of which were highly abundant in situ such as members of the Prevotella genus. Thus, MyTaxa can find several important applications in microbial identification and diversity studies.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2014-02-11
    Description: Increasing numbers of protein structures are solved each year, but many of these structures belong to proteins whose sequences are homologous to sequences in the Protein Data Bank. Nevertheless, the structures of homologous proteins belonging to the same family contain useful information because functionally important residues are expected to preserve physico-chemical, structural and energetic features. This information forms the basis of our method, which detects RNA-binding residues of a given RNA-binding protein as those residues that preserve physico-chemical, structural and energetic features in its homologs. Tests on 81 RNA-bound and 35 RNA-free protein structures showed that our method yields a higher fraction of true RNA-binding residues (higher precision) than two structure-based and two sequence-based machine-learning methods. Because the method requires no training data set and has no parameters, its precision does not degrade when applied to ‘novel’ protein sequences unlike methods that are parameterized for a given training data set. It was used to predict the ‘unknown’ RNA-binding residues in the C-terminal RNA-binding domain of human CPEB3. The two predicted residues, F430 and F474, were experimentally verified to bind RNA, in particular F430, whose mutation to alanine or asparagine nearly abolished RNA binding. The method has been implemented in a webserver called DR_bind1, which is freely available with no login requirement at http://drbind.limlab.ibms.sinica.edu.tw .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2014-04-03
    Description: Epigenetic regulation of gene expression involves, besides DNA and histone modifications, the relative positioning of DNA sequences within the nucleus. To trace specific DNA sequences in living cells, we used programmable sequence-specific DNA binding of designer transcription activator-like effectors (dTALEs). We designed a recombinant dTALE (msTALE) with variable repeat domains to specifically bind a 19-bp target sequence of major satellite DNA. The msTALE was fused with green fluorescent protein (GFP) and stably expressed in mouse embryonic stem cells. Hybridization with a major satellite probe (3D-fluorescent in situ hybridization) and co-staining for known cellular structures confirmed in vivo binding of the GFP-msTALE to major satellite DNA present at nuclear chromocenters. Dual tracing of major satellite DNA and the replication machinery throughout S-phase showed co-localization during mid to late S-phase, directly demonstrating the late replication timing of major satellite DNA. Fluorescence bleaching experiments indicated a relatively stable but still dynamic binding, with mean residence times in the range of minutes. Fluorescently labeled dTALEs open new perspectives to target and trace DNA sequences and to monitor dynamic changes in subnuclear positioning as well as interactions with functional nuclear structures during cell cycle progression and cellular differentiation.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2014-04-03
    Description: Coupling bisulfite conversion with next-generation sequencing (Bisulfite-seq) enables genome-wide measurement of DNA methylation, but poses unique challenges for mapping. However, despite a proliferation of Bisulfite-seq mapping tools, no systematic comparison of their genomic coverage and quantitative accuracy has been reported. We sequenced bisulfite-converted DNA from two tissues from each of two healthy human adults and systematically compared five widely used Bisulfite-seq mapping algorithms: Bismark, BSMAP, Pash, BatMeth and BS Seeker. We evaluated their computational speed and genomic coverage and verified their percentage methylation estimates. With the exception of BatMeth, all mappers covered 〉70% of CpG sites genome-wide and yielded highly concordant estimates of percentage methylation ( r 2 ≥ 0.95). Fourfold variation in mapping time was found between BSMAP (fastest) and Pash (slowest). In each library, 8–12% of genomic regions covered by Bismark and Pash were not covered by BSMAP. An experiment using simulated reads confirmed that Pash has an exceptional ability to uniquely map reads in genomic regions of structural variation. Independent verification by bisulfite pyrosequencing generally confirmed the percentage methylation estimates by the mappers. Of these algorithms, Bismark provides an attractive combination of processing speed, genomic coverage and quantitative accuracy, whereas Pash offers considerably higher genomic coverage.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2014-09-17
    Description: Three-dimensional organization of chromatin is fundamental for transcriptional regulation. Tissue-specific transcriptional programs are orchestrated by transcription factors and epigenetic regulators. The RUNX2 transcription factor is required for differentiation of precursor cells into mature osteoblasts. Although organization and control of the bone-specific Runx2-P1 promoter have been studied extensively, long-range regulation has not been explored. In this study, we investigated higher-order organization of the Runx2-P1 promoter during osteoblast differentiation. Mining the ENCODE database revealed interactions between Runx2-P1 and  Supt3h promoters in several non-mesenchymal human cell lines. Supt3h is a ubiquitously expressed gene located within the first intron of Runx2 . These two genes show shared synteny across species from humans to sponges. Chromosome conformation capture analysis in the murine pre-osteoblastic MC3T3-E1 cell line revealed increased contact frequency between Runx2-P1 and Supt3h promoters during differentiation. This increase was accompanied by enhanced DNaseI hypersensitivity along with RUNX2 and CTCF binding at the Supt3h promoter. Furthermore, interplasmid-3C and luciferase reporter assays showed that the Supt3h promoter can modulate Runx2-P1 activity via direct association. Taken together, our data demonstrate physical proximity between Runx2-P1 and Supt3h promoters, consistent with their syntenic nature. Importantly, we identify the Supt3h promoter as a potential regulator of the bone-specific Runx2-P1 promoter .
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2014-10-10
    Description: Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold , which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold , we design ten cis -cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis -cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2014-09-27
    Description: While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an ‘accessory’ during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2014-12-17
    Description: The thermophilic fungus Chaetomium thermophilum holds great promise for structural biology. To increase the efficiency of its biochemical and structural characterization and to explore its thermophilic properties beyond those of individual proteins, we obtained transcriptomics and proteomics data, and integrated them with computational annotation methods and a multitude of biochemical experiments conducted by the structural biology community. We considerably improved the genome annotation of Chaetomium thermophilum and characterized the transcripts and expression of thousands of genes. We furthermore show that the composition and structure of the expressed proteome of Chaetomium thermophilum is similar to its mesophilic relatives. Data were deposited in a publicly available repository and provide a rich source to the structural biology community.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2015-03-14
    Description: The rapid discovery of potential driver mutations through large-scale mutational analyses of human cancers generates a need to characterize their cellular phenotypes. Among the techniques for genome editing, recombinant adeno-associated virus (rAAV)-mediated gene targeting is suited for knock-in of single nucleotide substitutions and to a lesser degree for gene knock-outs. However, the generation of gene targeting constructs and the targeting process is time-consuming and labor-intense. To facilitate rAAV-mediated gene targeting, we developed the first software and complementary automation-friendly vector tools to generate optimized targeting constructs for editing human protein encoding genes. By computational approaches, rAAV constructs for editing ~71% of bases in protein-coding exons were designed. Similarly, ~81% of genes were predicted to be targetable by rAAV-mediated knock-out. A Gateway-based cloning system for facile generation of rAAV constructs suitable for robotic automation was developed and used in successful generation of targeting constructs. Together, these tools enable automated rAAV targeting construct design, generation as well as enrichment and expansion of targeted cells with desired integrations.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2015-03-14
    Description: Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2015-01-10
    Description: Transcription regulation in multicellular eukaryotes is orchestrated by a number of DNA functional elements located at gene regulatory regions. Some regulatory regions (e.g. enhancers) are located far away from the gene they affect. Identification of distal regulatory elements is a challenge for the bioinformatics research. Although existing methodologies increased the number of computationally predicted enhancers, performance inconsistency of computational models across different cell-lines, class imbalance within the learning sets and ad hoc rules for selecting enhancer candidates for supervised learning, are some key questions that require further examination. In this study we developed DEEP, a novel ensemble prediction framework. DEEP integrates three components with diverse characteristics that streamline the analysis of enhancer's properties in a great variety of cellular conditions. In our method we train many individual classification models that we combine to classify DNA regions as enhancers or non-enhancers. DEEP uses features derived from histone modification marks or attributes coming from sequence characteristics. Experimental results indicate that DEEP performs better than four state-of-the-art methods on the ENCODE data. We report the first computational enhancer prediction results on FANTOM5 data where DEEP achieves 90.2% accuracy and 90% geometric mean (GM) of specificity and sensitivity across 36 different tissues. We further present results derived using in vivo -derived enhancer data from VISTA database. DEEP-VISTA, when tested on an independent test set, achieved GM of 80.1% and accuracy of 89.64%. DEEP framework is publicly available at http://cbrc.kaust.edu.sa/deep/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2012-10-10
    Description: The Joint BioEnergy Institute Inventory of Composable Elements (JBEI-ICEs) is an open source registry platform for managing information about biological parts. It is capable of recording information about ‘legacy’ parts, such as plasmids, microbial host strains and Arabidopsis seeds, as well as DNA parts in various assembly standards. ICE is built on the idea of a web of registries and thus provides strong support for distributed interconnected use. The information deposited in an ICE installation instance is accessible both via a web browser and through the web application programming interfaces, which allows automated access to parts via third-party programs. JBEI-ICE includes several useful web browser-based graphical applications for sequence annotation, manipulation and analysis that are also open source. As with open source software, users are encouraged to install, use and customize JBEI-ICE and its components for their particular purposes. As a web application programming interface, ICE provides well-developed parts storage functionality for other synthetic biology software projects. A public instance is available at public-registry.jbei.org, where users can try out features, upload parts or simply use it for their projects. The ICE software suite is available via Google Code, a hosting site for community-driven open source projects.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2014-05-01
    Description: Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k -mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25 x coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2014-11-12
    Description: Understanding the role of a given transcription factor (TF) in regulating gene expression requires precise mapping of its binding sites in the genome. Chromatin immunoprecipitation-exo, an emerging technique using exonuclease to digest TF unbound DNA after ChIP, is designed to reveal transcription factor binding site (TFBS) boundaries with near-single nucleotide resolution. Although ChIP-exo promises deeper insights into transcription regulation, no dedicated bioinformatics tool exists to leverage its advantages. Most ChIP-seq and ChIP-chip analytic methods are not tailored for ChIP-exo, and thus cannot take full advantage of high-resolution ChIP-exo data. Here we describe a novel analysis framework, termed MACE (model-based analysis of ChIP-exo) dedicated to ChIP-exo data analysis. The MACE workflow consists of four steps: (i) sequencing data normalization and bias correction; (ii) signal consolidation and noise reduction; (iii) single-nucleotide resolution border peak detection using the Chebyshev Inequality and (iv) border matching using the Gale-Shapley stable matching algorithm. When applied to published human CTCF, yeast Reb1 and our own mouse ONECUT1/HNF6 ChIP-exo data, MACE is able to define TFBSs with high sensitivity, specificity and spatial resolution, as evidenced by multiple criteria including motif enrichment, sequence conservation, direct sequence pileup, nucleosome positioning and open chromatin states. In addition, we show that the fundamental advance of MACE is the identification of two boundaries of a TFBS with high resolution, whereas other methods only report a single location of the same event. The two boundaries help elucidate the in vivo binding structure of a given TF, e.g. whether the TF may bind as dimers or in a complex with other co-factors.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2014-09-02
    Description: Functional mechanisms of biomolecules often manifest themselves precisely in transient conformational substates. Researchers have long sought to structurally characterize dynamic processes in non-coding RNA, combining experimental data with computer algorithms. However, adequate exploration of conformational space for these highly dynamic molecules, starting from static crystal structures, remains challenging. Here, we report a new conformational sampling procedure, KGSrna, which can efficiently probe the native ensemble of RNA molecules in solution. We found that KGSrna ensembles accurately represent the conformational landscapes of 3D RNA encoded by NMR proton chemical shifts. KGSrna resolves motionally averaged NMR data into structural contributions; when coupled with residual dipolar coupling data, a KGSrna ensemble revealed a previously uncharacterized transient excited state of the HIV-1 trans-activation response element stem–loop. Ensemble-based interpretations of averaged data can aid in formulating and testing dynamic, motion-based hypotheses of functional mechanisms in RNAs with broad implications for RNA engineering and therapeutic intervention.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2014-09-02
    Description: Binding of transcription factors to their binding sites in promoter regions is the fundamental event in transcriptional gene regulation. When a transcription factor binding site is located within a nucleosome, the DNA has to partially unwrap from the nucleosome to allow transcription factor binding. This reduces the rate of transcription factor binding and is a known mechanism for regulation of gene expression via chromatin structure. Recently a second mechanism has been reported where transcription factor off-rates are dramatically increased when binding to target sites within the nucleosome. There are two possible explanations for such an increase in off-rate short of an active role of the nucleosome in pushing the transcription factor off the DNA: (i) for dimeric transcription factors the nucleosome can change the equilibrium between monomeric and dimeric binding or (ii) the nucleosome can change the equilibrium between specific and non-specific binding to the DNA. We explicitly model both scenarios and find that dimeric binding can explain a large increase in off-rate while the non-specific binding model cannot be reconciled with the large, experimentally observed increase. Our results suggest a general mechanism how nucleosomes increase transcription factor dissociation to promote exchange of transcription factors and regulate gene expression.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2014-08-15
    Description: High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events without class labels. In this paper, we present SigFuge for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative splicing, or differences in isoform expression. We apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A , a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq cohorts. SigFuge is available as an R package through Bioconductor.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2016-05-20
    Description: Epigenetic modifications of histone tails play an essential role in the regulation of eukaryotic transcription. Writer and eraser enzymes establish and maintain the epigenetic code by creating or removing posttranslational marks. Specific binding proteins, called readers, recognize the modifications and mediate epigenetic signalling. Here, we present a versatile assay platform for the investigation of the interaction between methyl lysine readers and their ligands. This can be utilized for the screening of small-molecule inhibitors of such protein–protein interactions and the detailed characterization of the inhibition. Our platform is constructed in a modular way consisting of orthogonal in vitro binding assays for ligand screening and verification of initial hits and biophysical, label-free techniques for further kinetic characterization of confirmed ligands. A stability assay for the investigation of target engagement in a cellular context complements the platform. We applied the complete evaluation chain to the Tudor domain containing protein Spindlin1 and established the in vitro test systems for the double Tudor domain of the histone demethylase JMJD2C. We finally conducted an exploratory screen for inhibitors of the interaction between Spindlin1 and H3K4me3 and identified A366 as the first nanomolar small-molecule ligand of a Tudor domain containing methyl lysine reader.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2016-03-19
    Description: The yeast mutant collections are a fundamental tool in deciphering genomic organization and function. Over the last decade, they have been used for the systematic exploration of ~6 000 000 double gene mutants, identifying and cataloging genetic interactions among them. Here we studied the extent to which these data are prone to neighboring gene effects (NGEs), a phenomenon by which the deletion of a gene affects the expression of adjacent genes along the genome. Analyzing ~90,000 negative genetic interactions observed to date, we found that more than 10% of them are incorrectly annotated due to NGEs. We developed a novel algorithm, GINGER, to identify and correct erroneous interaction annotations. We validated the algorithm using a comparative analysis of interactions from Schizosaccharomyces pombe . We further showed that our predictions are significantly more concordant with diverse biological data compared to their mis-annotated counterparts. Our work uncovered about 9500 new genetic interactions in yeast.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2016-03-19
    Description: Transfer RNAs (tRNAs) are essential for encoding the transcribed genetic information from DNA into proteins. Variations in the human tRNAs are involved in diverse clinical phenotypes. Interestingly, all pathogenic variations in tRNAs are located in mitochondrial tRNAs (mt-tRNAs). Therefore, it is crucial to identify pathogenic variations in mt-tRNAs for disease diagnosis and proper treatment. We collected mt-tRNA variations using a classification based on evidence from several sources and used the data to develop a multifactorial probability-based prediction method, PON-mt-tRNA, for classification of mt-tRNA single nucleotide substitutions. We integrated a machine learning-based predictor and an evidence-based likelihood ratio for pathogenicity using evidence of segregation, biochemistry and histochemistry to predict the posterior probability of pathogenicity of variants. The accuracy and Matthews correlation coefficient (MCC) of PON-mt-tRNA are 1.00 and 0.99, respectively. In the absence of evidence from segregation, biochemistry and histochemistry, PON-mt-tRNA classifies variations based on the machine learning method with an accuracy and MCC of 0.69 and 0.39, respectively. We classified all possible single nucleotide substitutions in all human mt-tRNAs using PON-mt-tRNA. The variations in the loops are more often tolerated compared to the variations in stems. The anticodon loop contains comparatively more predicted pathogenic variations than the other loops. PON-mt-tRNA is available at http://structure.bmc.lu.se/PON-mt-tRNA/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2016-05-06
    Description: Sequence Logos and its variants are the most commonly used method for visualization of multiple sequence alignments (MSAs) and sequence motifs. They provide consensus-based summaries of the sequences in the alignment. Consequently, individual sequences cannot be identified in the visualization and covariant sites are not easily discernible. We recently proposed Sequence Bundles , a motif visualization technique that maintains a one-to-one relationship between sequences and their graphical representation and visualizes covariant sites. We here present Alvis, an open-source platform for the joint explorative analysis of MSAs and phylogenetic trees, employing Sequence Bundles as its main visualization method. Alvis combines the power of the visualization method with an interactive toolkit allowing detection of covariant sites, annotation of trees with synapomorphies and homoplasies, and motif detection. It also offers numerical analysis functionality, such as dimension reduction and classification. Alvis is user-friendly, highly customizable and can export results in publication-quality figures. It is available as a full-featured standalone version ( http://www.bitbucket.org/rfs/alvis ) and its Sequence Bundles visualization module is further available as a web application ( http://science-practice.com/projects/sequence-bundles ).
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2016-04-08
    Description: The brain is built from a large number of cell types which have been historically classified using location, morphology and molecular markers. Recent research suggests an important role of epigenetics in shaping and maintaining cell identity in the brain. To elucidate the role of DNA methylation in neuronal differentiation, we developed a new protocol for separation of nuclei from the two major populations of human prefrontal cortex neurons—GABAergic interneurons and glutamatergic (GLU) projection neurons. Major differences between the neuronal subtypes were revealed in CpG, non-CpG and hydroxymethylation (hCpG). A dramatically greater number of undermethylated CpG sites in GLU versus GABA neurons were identified. These differences did not directly translate into differences in gene expression and did not stem from the differences in hCpG methylation, as more hCpG methylation was detected in GLU versus GABA neurons. Notably, a comparable number of undermethylated non-CpG sites were identified in GLU and GABA neurons, and non-CpG methylation was a better predictor of subtype-specific gene expression compared to CpG methylation. Regions that are differentially methylated in GABA and GLU neurons were significantly enriched for schizophrenia risk loci. Collectively, our findings suggest that functional differences between neuronal subtypes are linked to their epigenetic specification.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2016-04-21
    Description: Chromatin immunoprecipitation followed by next generation sequencing (ChIP-seq) is a key technique in chromatin research. Although heavily applied, existing ChIP-seq protocols are often highly fine-tuned workflows, optimized for specific experimental requirements. Especially the initial steps of ChIP-seq, particularly chromatin shearing, are deemed to be exceedingly cell-type-specific, thus impeding any protocol standardization efforts. Here we demonstrate that harmonization of ChIP-seq workflows across cell types and conditions is possible when obtaining chromatin from properly isolated nuclei. We established an ultrasound-based nuclei extraction method (NEXSON: Nuclei EXtraction by SONication) that is highly effective across various organisms, cell types and cell numbers. The described method has the potential to replace complex cell-type-specific, but largely ineffective, nuclei isolation protocols. By including NEXSON in ChIP-seq workflows, we completely eliminate the need for extensive optimization and sample-dependent adjustments. Apart from this significant simplification, our approach also provides the basis for a fully standardized ChIP-seq and yields highly reproducible transcription factor and histone modifications maps for a wide range of different cell types. Even small cell numbers (~10 000 cells per ChIP) can be easily processed without application of modified chromatin or library preparation protocols.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2013-07-16
    Description: The coupling of chromosome conformation capture (3C) with next-generation sequencing technologies enables the high-throughput detection of long-range genomic interactions, via the generation of ligation products between DNA sequences, which are closely juxtaposed in vivo . These interactions involve promoter regions, enhancers and other regulatory and structural elements of chromosomes and can reveal key details of the regulation of gene expression. 3C-seq is a variant of the method for the detection of interactions between one chosen genomic element (viewpoint) and the rest of the genome. We present r3Cseq , an R/Bioconductor package designed to perform 3C-seq data analysis in a number of different experimental designs. The package reads a common aligned read input format, provides data normalization, allows the visualization of candidate interaction regions and detects statistically significant chromatin interactions, thus greatly facilitating hypothesis generation and the interpretation of experimental results. We further demonstrate its use on a series of real-world applications.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2012-06-28
    Description: Bromodeoxyuridine (5-bromo-2'-deoxyuridine, BrdU) is a halogenated nucleotide of low toxicity commonly used to monitor DNA replication. It is considered a valuable tool for in vitro and in vivo studies, including the detection of the small population of neural stem cells (NSC) in the mammalian brain. Here, we show that NSC grown in self-renewing conditions in vitro , when exposed to BrdU, lose the expression of stem cell markers like Nestin, Sox2 and Pax6 and undergo glial differentiation, strongly up-regulating the astrocytic marker GFAP. The onset of GFAP expression in BrdU exposed NSC was paralleled by a reduced expression of key DNA methyltransferases (DNMT) and a rapid loss of global DNA CpG methylation, as we determined by our specially developed analytic assay. Remarkably, a known DNA demethylating compound, 5-aza-2'-deoxycytidine (Decitabine), had similar effect on demethylation and differentiation of NSC. Since our key findings apply also to NSC derived from murine forebrain, our observations strongly suggest more caution in BrdU uses in stem cells research. We also propose that BrdU and its related substances may also open new opportunities for differentiation therapy in oncology.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2012-06-28
    Description: In Escherichia coli , the SeqA protein binds specifically to GATC sequences which are methylated on the A of the old strand but not on the new strand. Such hemimethylated DNA is produced by progression of the replication forks and lasts until Dam methyltransferase methylates the new strand. It is therefore believed that a region of hemimethylated DNA covered by SeqA follows the replication fork. We show that this is, indeed, the case by using global ChIP on Chip analysis of SeqA in cells synchronized regarding DNA replication. To assess hemimethylation, we developed the first genome-wide method for methylation analysis in bacteria. Since loss of the SeqA protein affects growth rate only during rapid growth when cells contain multiple replication forks, a comparison of rapid and slow growth was performed. In cells with six replication forks per chromosome, the two old forks were found to bind surprisingly little SeqA protein. Cell cycle analysis showed that loss of SeqA from the old forks did not occur at initiation of the new forks, but instead occurs at a time point coinciding with the end of SeqA-dependent origin sequestration. The finding suggests simultaneous origin de-sequestration and loss of SeqA from old replication forks.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2012-08-23
    Description: Live-cell measurement of protein binding to chromatin allows probing cellular biochemistry in physiological conditions, which are difficult to mimic in vitro . However, different studies have yielded widely discrepant predictions, and so it remains uncertain how to make the measurements accurately. To establish a benchmark we measured binding of the transcription factor p53 to chromatin by three approaches: fluorescence recovery after photobleaching (FRAP), fluorescence correlation spectroscopy (FCS) and single-molecule tracking (SMT). Using new procedures to analyze the SMT data and to guide the FRAP and FCS analysis, we show how all three approaches yield similar estimates for both the fraction of p53 molecules bound to chromatin (only about 20%) and the residence time of these bound molecules (~1.8 s). We also apply these procedures to mutants in p53 chromatin binding. Our results support the model that p53 locates specific sites by first binding at sequence-independent sites.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2016-12-01
    Description: The study of changes in protein–DNA interactions measured by ChIP-seq on dynamic systems, such as cell differentiation, response to treatments or the comparison of healthy and diseased individuals, is still an open challenge. There are few computational methods comparing changes in ChIP-seq signals with replicates. Moreover, none of these previous approaches addresses ChIP-seq specific experimental artefacts arising from studies with biological replicates. We propose THOR, a Hidden Markov Model based approach, to detect differential peaks between pairs of biological conditions with replicates. THOR provides all pre- and post-processing steps required in ChIP-seq analyses. Moreover, we propose a novel normalization approach based on housekeeping genes to deal with cases where replicates have distinct signal-to-noise ratios. To evaluate differential peak calling methods, we delineate a methodology using both biological and simulated data. This includes an evaluation procedure that associates differential peaks with changes in gene expression as well as histone modifications close to these peaks. We evaluate THOR and seven competing methods on data sets with distinct characteristics from in vitro studies with technical replicates to clinical studies of cancer patients. Our evaluation analysis comprises of 13 comparisons between pairs of biological conditions. We show that THOR performs best in all scenarios.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2016-10-14
    Description: Functional RNA regions are often related to recurrent secondary structure patterns (or motifs), which can exert their role in several different ways, particularly in dictating the interaction with RNA-binding proteins, and acting in the regulation of a large number of cellular processes. Among the available motif-finding tools, the majority focuses on sequence patterns, sometimes including secondary structure as additional constraints to improve their performance. Nonetheless, secondary structures motifs may be concurrent to their sequence counterparts or even encode a stronger functional signal. Current methods for searching structural motifs generally require long pipelines and/or high computational efforts or previously aligned sequences. Here, we present BEAM (BEAr Motif finder), a novel method for structural motif discovery from a set of unaligned RNAs, taking advantage of a recently developed encoding for RNA secondary structure named BEAR (Brand nEw Alphabet for RNAs) and of evolutionary substitution rates of secondary structure elements. Tested in a varied set of scenarios, from small- to large-scale, BEAM is successful in retrieving structural motifs even in highly noisy data sets, such as those that can arise in CLIP-Seq or other high-throughput experiments.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2016-12-01
    Description: Whole exome sequencing (WES) accelerates disease gene discovery using rare genetic variants, but further statistical and functional evidence is required to avoid false-discovery. To complement variant-driven disease gene discovery, here we present function-driven disease gene discovery in zebrafish ( Danio rerio ), a promising human disease model owing to its high anatomical and genomic similarity to humans. To facilitate zebrafish-based function-driven disease gene discovery, we developed a genome-scale co-functional network of zebrafish genes, DanioNet ( www.inetbio.org/danionet ), which was constructed by Bayesian integration of genomics big data. Rigorous statistical assessment confirmed the high prediction capacity of DanioNet for a wide variety of human diseases. To demonstrate the feasibility of the function-driven disease gene discovery using DanioNet, we predicted genes for ciliopathies and performed experimental validation for eight candidate genes. We also validated the existence of heterozygous rare variants in the candidate genes of individuals with ciliopathies yet not in controls derived from the UK10K consortium, suggesting that these variants are potentially involved in enhancing the risk of ciliopathies. These results showed that an integrated genomics big data for a model animal of diseases can expand our opportunity for harnessing WES data in disease gene discovery.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2016-12-04
    Description: Recently, a number of advances have been implemented into the core ChIP-seq (chromatin immunoprecipitation coupled with next-generation sequencing) methodology to streamline the process, reduce costs or improve data resolution. Several of these emerging ChIP-based methods perform additional chemical steps on bead-bound immunoprecipitated chromatin, posing a challenge for generating similarly treated input controls required for artifact removal during bioinformatics analyses. Here we present a versatile method for producing technique-specific input controls for ChIP-based methods that utilize additional bead-bound processing steps. This reported method, termed protein attached chromatin capture (PAtCh-Cap), relies on the non-specific capture of chromatin-bound proteins via their carboxylate groups, leaving the DNA accessible for subsequent chemical treatments in parallel with chromatin separately immunoprecipitated for the target protein. Application of this input strategy not only significantly enhanced artifact removal from ChIP-exo data, increasing confidence in peak identification and allowing for de novo motif searching, but also afforded discovery of a novel CTCF binding motif.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2016-12-04
    Description: Population-scale sequencing is increasingly uncovering large numbers of rare single-nucleotide variants (SNVs) in coding regions of the genome. The rarity of these variants makes it challenging to evaluate their deleteriousness with conventional phenotype–genotype associations. Protein structures provide a way of addressing this challenge. Previous efforts have focused on globally quantifying the impact of SNVs on protein stability. However, local perturbations may severely impact protein functionality without strongly disrupting global stability (e.g. in relation to catalysis or allostery). Here, we describe a workflow in which localized frustration, quantifying unfavorable local interactions, is employed as a metric to investigate such effects. Using this workflow on the Protein Databank, we find that frustration produces many immediately intuitive results: for instance, disease-related SNVs create stronger changes in localized frustration than non-disease related variants, and rare SNVs tend to disrupt local interactions to a larger extent than common variants. Less obviously, we observe that somatic SNVs associated with oncogenes and tumor suppressor genes (TSGs) induce very different changes in frustration. In particular, those associated with TSGs change the frustration more in the core than the surface (by introducing loss-of-function events), whereas those associated with oncogenes manifest the opposite pattern, creating gain-of-function events.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2016-12-17
    Description: A complex disease generally results not from malfunction of individual molecules but from dysfunction of the relevant system or network, which dynamically changes with time and conditions. Thus, estimating a condition-specific network from a single sample is crucial to elucidating the molecular mechanisms of complex diseases at the system level. However, there is currently no effective way to construct such an individual-specific network by expression profiling of a single sample because of the requirement of multiple samples for computing correlations. We developed here with a statistical method, i.e. a sample-specific network (SSN) method, which allows us to construct individual-specific networks based on molecular expressions of a single sample. Using this method, we can characterize various human diseases at a network level. In particular, such SSNs can lead to the identification of individual-specific disease modules as well as driver genes, even without gene sequencing information. Extensive analysis by using the Cancer Genome Atlas data not only demonstrated the effectiveness of the method, but also found new individual-specific driver genes and network patterns for various types of cancer. Biological experiments on drug resistance further validated one important advantage of our method over the traditional methods, i.e. we can even identify such drug resistance genes that actually have no clear differential expression between samples with and without the resistance, due to the additional network information.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2016-12-17
    Description: Motivation: Many biological processes, such as cell cycle, circadian clock, menstrual cycles, are governed by oscillatory systems consisting of numerous components that exhibit rhythmic patterns over time. It is not always easy to identify such rhythmic components. For example, it is a challenging problem to identify circadian genes in a given tissue using time-course gene expression data. There is a great potential for misclassifying non-rhythmic as rhythmic genes and vice versa. This has been a problem of considerable interest in recent years. In this article we develop a constrained inference based methodology called Order Restricted Inference for Oscillatory Systems (ORIOS) to detect rhythmic signals. Instead of using mathematical functions (e.g. sinusoidal) to describe shape of rhythmic signals, ORIOS uses mathematical inequalities. Consequently, it is robust and not limited by the biologist's choice of the mathematical model. We studied the performance of ORIOS using simulated as well as real data obtained from mouse liver, pituitary gland and data from NIH3T3, U2OS cell lines. Our results suggest that, for a broad collection of patterns of gene expression, ORIOS has substantially higher power to detect true rhythmic genes in comparison to some popular methods, while also declaring substantially fewer non-rhythmic genes as rhythmic. Availability and Implementation: A user friendly code implemented in R language can be downloaded from http://www.niehs.nih.gov/research/atniehs/labs/bb/staff/peddada/index.cfm . Contact: peddada@niehs.nih.gov
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2012-08-08
    Description: Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2013-01-20
    Description: Genomic deletions induced by imprecise excision of transposons have been used to disrupt gene functions in Drosophila . To determine the excision properties of Tol2 , a popular transposon in zebrafish, we took advantage of two transgenic zebrafish lines Et(gata2a:EGFP)pku684 and Et(gata2a:EGFP)pku760 , and mobilized the transposon by injecting transposase mRNA into homozygous transgenic embryos. Footprint analysis showed that the Tol2 transposons were excised in either a precise or an imprecise manner. Furthermore, we identified 1093-bp and 1253-bp genomic deletions in Et(gata2a:EGFP)pku684 founder embryos flanking the 5' end of the original Tol2 insertion site, and a 1340-bp deletion in the Et(gata2a:EGFP)pku760 founder embryos flanking the 3' end of the insertion site. The mosaic Et(gata2a:EGFP)pku684 embryos were raised to adulthood and screened for germline transmission of Tol2 excision in their F 1 progeny. On average, ~42% of the F 1 embryos displayed loss or altered EGFP patterns, demonstrating that this transposon could be efficiently excised from the zebrafish genome in the germline. Furthermore, from 59 founders, we identified one that transmitted the 1093-bp genomic deletion to its offspring. These results suggest that imprecise Tol2 transposon excision can be used as an alternative strategy to achieve gene targeting in zebrafish.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2012-09-27
    Description: DNA methylation plays a key role in epigenetic regulation of eukaryotic genomes. Hence the genome-wide distribution of 5-methylcytosine, or the methylome, has been attracting intense attention. In recent years, whole-genome bisulfite sequencing (WGBS) has enabled methylome analysis at single-base resolution. However, WGBS typically requires microgram quantities of DNA as well as global PCR amplification, thereby precluding its application to samples of limited amounts. This is presumably because bisulfite treatment of adaptor-tagged templates, which is inherent to current WGBS methods, leads to substantial DNA fragmentation. To circumvent the bisulfite-induced loss of intact sequencing templates, we conceived an alternative method termed Post-Bisulfite Adaptor Tagging (PBAT) wherein bisulfite treatment precedes adaptor tagging by two rounds of random primer extension. The PBAT method can generate a substantial number of unamplified reads from as little as subnanogram quantities of DNA. It requires only 100 ng of DNA for amplification-free WGBS of mammalian genomes. Thus, the PBAT method will enable various novel applications that would not otherwise be possible, thereby contributing to the rapidly growing field of epigenomics.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2012-09-27
    Description: Programmed –1 ribosomal frameshifting is employed in the expression of a number of viral and cellular genes. In this process, the ribosome slips backwards by a single nucleotide and continues translation of an overlapping reading frame, generating a fusion protein. Frameshifting signals comprise a heptanucleotide slippery sequence, where the ribosome changes frame, and a stimulatory RNA structure, a stem–loop or RNA pseudoknot. Antisense oligonucleotides annealed appropriately 3' of a slippery sequence have also shown activity in frameshifting, at least in vitro . Here we examined frameshifting at the U 6 A slippery sequence of the HIV gag/pol signal and found high levels of both –1 and –2 frameshifting with stem–loop, pseudoknot or antisense oligonucleotide stimulators. By examining –1 and –2 frameshifting outcomes on mRNAs with varying slippery sequence-stimulatory RNA spacing distances, we found that –2 frameshifting was optimal at a spacer length 1–2 nucleotides shorter than that optimal for –1 frameshifting with all stimulatory RNAs tested. We propose that the shorter spacer increases the tension on the mRNA such that when the tRNA detaches, it more readily enters the –2 frame on the U 6 A heptamer. We propose that mRNA tension is central to frameshifting, whether promoted by stem–loop, pseudoknot or antisense oligonucleotide stimulator.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2012-10-24
    Description: Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped 〉2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2012-11-04
    Description: The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign , a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2012-11-04
    Description: The mammalian thymine DNA glycosylase (TDG) is implicated in active DNA demethylation via the base excision repair pathway. TDG excises the mismatched base from G:X mismatches, where X is uracil, thymine or 5-hydroxymethyluracil (5hmU). These are, respectively, the deamination products of cytosine, 5-methylcytosine (5mC) and 5-hydroxymethylcytosine (5hmC). In addition, TDG excises the Tet protein products 5-formylcytosine (5fC) and 5-carboxylcytosine (5caC) but not 5hmC and 5mC, when paired with a guanine. Here we present a post-reactive complex structure of the human TDG domain with a 28-base pair DNA containing a G:5hmU mismatch. TDG flips the target nucleotide from the double-stranded DNA, cleaves the N -glycosidic bond and leaves the C1' hydrolyzed abasic sugar in the flipped state. The cleaved 5hmU base remains in a binding pocket of the enzyme. TDG allows hydrogen-bonding interactions to both T/U-based (5hmU) and C-based (5caC) modifications, thus enabling its activity on a wider range of substrates. We further show that the TDG catalytic domain has higher activity for 5caC at a lower pH (5.5) as compared to the activities at higher pH (7.5 and 8.0) and that the structurally related Escherichia coli mismatch uracil glycosylase can excise 5caC as well. We discuss several possible mechanisms, including the amino-imino tautomerization of the substrate base that may explain how TDG discriminates against 5hmC and 5mC.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2012-11-25
    Description: Identifying cancer driver genes and pathways among all somatic mutations detected in a cohort of tumors is a key challenge in cancer genomics. Traditionally, this is done by prioritizing genes according to the recurrence of alterations that they bear. However, this approach has some known limitations, such as the difficulty to correctly estimate the background mutation rate, and the fact that it cannot identify lowly recurrently mutated driver genes. Here we present a novel approach, Oncodrive-fm, to detect candidate cancer drivers which does not rely on recurrence. First, we hypothesized that any bias toward the accumulation of variants with high functional impact observed in a gene or group of genes may be an indication of positive selection and can thus be used to detect candidate driver genes or gene modules. Next, we developed a method to measure this bias (FM bias) and applied it to three datasets of tumor somatic variants. As a proof of concept of our hypothesis we show that most of the highly recurrent and well-known cancer genes exhibit a clear FM bias. Moreover, this novel approach avoids some known limitations of recurrence-based approaches, and can successfully identify lowly recurrent candidate cancer drivers.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2013-08-28
    Description: Combinations of histone modifications have significant biological roles, such as maintenance of pluripotency and cancer development, but cannot be analyzed at the single cell level. Here, we visualized a combination of histone modifications by applying the in situ proximity ligation assay, which detects two proteins in close vicinity (~30 nm). The specificity of the method [designated as imaging of a combination of histone modifications (iChmo)] was confirmed by positive signals from H3K4me3/acetylated H3K9, H3K4me3/RNA polymerase II and H3K9me3/H4K20me3, and negative signals from H3K4me3/H3K9me3. Bivalent modification was clearly visualized by iChmo in wild-type embryonic stem cells (ESCs) known to have it, whereas rarely in Suz12 knockout ESCs and mouse embryonic fibroblasts known to have little of it. iChmo was applied to analysis of epigenetic and phenotypic changes of heterogeneous cell population, namely, ESCs at an early stage of differentiation, and this revealed that the bivalent modification disappeared in a highly concerted manner, whereas phenotypic differentiation proceeded with large variations among cells. Also, using this method, we were able to visualize a combination of repressive histone marks in tissue samples. The application of iChmo to samples with heterogeneous cell population and tissue samples is expected to clarify unknown biological and pathological significance of various combinations of epigenetic modifications.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2013-03-13
    Description: Nucleosome positioning on the chromatin strand plays a critical role in regulating accessibility of DNA to transcription factors and chromatin modifying enzymes. Hence, detailed information on nucleosome depletion or movement at cis -acting regulatory elements has the potential to identify predicted binding sites for trans -acting factors. Using a novel method based on enrichment of mononucleosomal DNA by bacterial artificial chromosome hybridization, we mapped nucleosome positions by deep sequencing across 250 kb, encompassing the cystic fibrosis transmembrane conductance regulator ( CFTR ) gene. CFTR shows tight tissue-specific regulation of expression, which is largely determined by cis -regulatory elements that lie outside the gene promoter. Although multiple elements are known, the repertoire of transcription factors that interact with these sites to activate or repress CFTR expression remains incomplete. Here, we show that specific nucleosome depletion corresponds to well-characterized binding sites for known trans -acting factors, including hepatocyte nuclear factor 1, Forkhead box A1 and CCCTC-binding factor. Moreover, the cell-type selective nucleosome positioning is effective in predicting binding sites for novel interacting factors, such as BAF155. Finally, we identify transcription factor binding sites that are overrepresented in regions where nucleosomes are depleted in a cell-specific manner. This approach recognizes the glucocorticoid receptor as a novel trans -acting factor that regulates CFTR expression in vivo .
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2013-11-02
    Description: Here, we describe an approach to isolate native chromatin sections without genomic engineering for label-free proteomic identification of associated proteins and histone post-translational modifications. A transcription activator-like (TAL) protein A fusion protein was designed to recognize a unique site in the yeast GAL1 promoter. The TAL-PrA fusion enabled chromatin affinity purification (ChAP) of a small section of native chromatin upstream from the GAL1 locus, permitting mass spectrometric (MS) identification of proteins and histone post-translational modifications regulating galactose-induced transcription. This TAL-ChAP-MS approach allows the biochemical isolation of a specific native genomic locus for proteomic studies and will provide for unprecedented objective insight into protein and epigenetic mechanisms regulating site-specific chromosome metabolism.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2015-11-17
    Description: Microbial natural products are an invaluable source of evolved bioactive small molecules and pharmaceutical agents. Next-generation and metagenomic sequencing indicates untapped genomic potential, yet high rediscovery rates of known metabolites increasingly frustrate conventional natural product screening programs. New methods to connect biosynthetic gene clusters to novel chemical scaffolds are therefore critical to enable the targeted discovery of genetically encoded natural products. Here, we present PRISM, a computational resource for the identification of biosynthetic gene clusters, prediction of genetically encoded nonribosomal peptides and type I and II polyketides, and bio- and cheminformatic dereplication of known natural products. PRISM implements novel algorithms which render it uniquely capable of predicting type II polyketides, deoxygenated sugars, and starter units, making it a comprehensive genome-guided chemical structure prediction engine. A library of 57 tailoring reactions is leveraged for combinatorial scaffold library generation when multiple potential substrates are consistent with biosynthetic logic. We compare the accuracy of PRISM to existing genomic analysis platforms. PRISM is an open-source, user-friendly web application available at http://magarveylab.ca/prism/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2015-11-17
    Description: Sequencing DNA fragments associated with proteins following in vivo cross-linking with formaldehyde (known as ChIP-seq) has been used extensively to describe the distribution of proteins across genomes. It is not widely appreciated that this method merely estimates a protein's distribution and cannot reveal changes in occupancy between samples. To do this, we tagged with the same epitope orthologous proteins in Saccharomyces cerevisiae and Candida glabrata , whose sequences have diverged to a degree that most DNA fragments longer than 50 bp are unique to just one species. By mixing defined numbers of C. glabrata cells (the calibration genome) with S. cerevisiae samples (the experimental genomes) prior to chromatin fragmentation and immunoprecipitation, it is possible to derive a quantitative measure of occupancy (the occupancy ratio – OR) that enables a comparison of occupancies not only within but also between genomes. We demonstrate for the first time that this ‘internal standard’ calibration method satisfies the sine qua non for quantifying ChIP-seq profiles, namely linearity over a wide range. Crucially, by employing functional tagged proteins, our calibration process describes a method that distinguishes genuine association within ChIP-seq profiles from background noise. Our method is applicable to any protein, not merely highly conserved ones, and obviates the need for the time consuming, expensive, and technically demanding quantification of ChIP using qPCR, which can only be performed on individual loci. As we demonstrate for the first time in this paper, calibrated ChIP-seq represents a major step towards documenting the quantitative distributions of proteins along chromosomes in different cell states, which we term biological chromodynamics.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2015-08-18
    Description: Stochastic epigenetic changes drive biological processes, such as development, aging and disease. Yet, epigenetic information is typically collected from millions of cells, thereby precluding a more precise understanding of cell-to-cell variability and the pathogenic history of epimutations. Here we present a novel procedure for directly detecting epimutations in DNA methylation patterns using single-cell, locus-specific bisulfite sequencing (SLBS). We show that within gene promoter regions of mouse hepatocytes the epimutation rate is two orders of magnitude higher than the mutation rate.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2015-07-25
    Description: The cMonkey integrated biclustering algorithm identifies conditionally co-regulated modules of genes (biclusters). cMonkey integrates various orthogonal pieces of information which support evidence of gene co-regulation, and optimizes biclusters to be supported simultaneously by one or more of these prior constraints. The algorithm served as the cornerstone for constructing the first global, predictive Environmental Gene Regulatory Influence Network (EGRIN) model for a free-living cell, and has now been applied to many more organisms. However, due to its computational inefficiencies, long run-time and complexity of various input data types, cMonkey was not readily usable by the wider community. To address these primary concerns, we have significantly updated the cMonkey algorithm and refactored its implementation, improving its usability and extendibility. These improvements provide a fully functioning and user-friendly platform for building co-regulated gene modules and the tools necessary for their exploration and interpretation. We show, via three separate analyses of data for E. coli, M. tuberculosis and H. sapiens , that the updated algorithm and inclusion of novel scoring functions for new data types (e.g. ChIP-seq and transcription factor over-expression [TFOE]) improve discovery of biologically informative co-regulated modules. The complete cMonkey 2 software package, including source code, is available at https://github.com/baliga-lab/cmonkey2 .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2016-01-09
    Description: Hi-C experiments produce large numbers of DNA sequence read pairs that are typically analyzed to deduce genomewide interactions between arbitrary loci. A key step in these experiments is the cleavage of cross-linked chromatin with a restriction endonuclease. Although this cleavage should happen specifically at the enzyme's recognition sequence, an unknown proportion of cleavage events may involve other sequences, owing to the enzyme's star activity or to random DNA breakage. A quantitative estimation of these non-specific cleavages may enable simulating realistic Hi-C read pairs for validation of downstream analyses, monitoring the reproducibility of experimental conditions and investigating biophysical properties that correlate with DNA cleavage patterns. Here we describe a computational method for analyzing Hi-C read pairs to estimate the fractions of cleavages at different possible targets. The method relies on expressing an observed local target distribution downstream of aligned reads as a linear combination of known conditional local target distributions. We validated this method using Hi-C read pairs obtained by computer simulation. Application of the method to experimental Hi-C datasets from murine cells revealed interesting similarities and differences in patterns of cleavage across the various experiments considered.
    Keywords: Chromatin and Epigenetics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...