ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (267)
  • Computational Methods, Genomics  (140)
  • Synthetic Biology and Assembly Cloning  (70)
  • Computational Methods, Massively Parallel (Deep) Sequencing, Genomics  (34)
  • RNA characterisation and manipulation  (23)
  • Oxford University Press  (267)
  • Nucleic Acids Research  (267)
  • 60967
  • Biology  (267)
Collection
  • Articles  (267)
Publisher
  • Oxford University Press  (267)
Years
Journal
Topic
  • Biology  (267)
  • 1
    Publication Date: 2015-09-19
    Description: Recent releases of genome three-dimensional (3D) structures have the potential to transform our understanding of genomes. Nonetheless, the storage technology and visualization tools need to evolve to offer to the scientific community fast and convenient access to these data. We introduce simultaneously a database system to store and query 3D genomic data ( 3DBG ), and a 3D genome browser to visualize and explore 3D genome structures ( 3DGB ). We benchmark 3DBG against state-of-the-art systems and demonstrate that it is faster than previous solutions, and importantly gracefully scales with the size of data. We also illustrate the usefulness of our 3D genome Web browser to explore human genome structures. The 3D genome browser is available at http://3dgb.cs.mcgill.ca/ .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2015-09-19
    Description: Telomerase is a reverse transcriptase that maintains telomeres on the ends of chromosomes, allowing rapidly dividing cells to proliferate while avoiding senescence and apoptosis. Understanding telomerase gene expression and splicing at the single cell level could yield insights into the roles of telomerase during normal cell growth as well as cancer development. Here we use droplet-based single cell culture followed by single cell or colony transcript abundance analysis to investigate the relationship between cell growth and transcript abundance of the telomerase genes encoding the RNA component (hTR) and protein component (hTERT) as well as hTERT splicing. Jurkat and K562 cells were examined under normal cell culture conditions and during exposure to curcumin, a natural compound with anti-carcinogenic and telomerase activity-reducing properties. Individual cells predominantly express single hTERT splice variants, with the α+/β– variant exhibiting significant transcript abundance bimodality that is sustained through cell division. Sub-lethal curcumin exposure results in reduced bimodality of all hTERT splice variants and significant upregulation of alpha splicing, suggesting a possible role in cellular stress response. The single cell culture and transcript abundance analysis method presented here provides the tools necessary for multiparameter single cell analysis which will be critical for understanding phenotypes of heterogeneous cell populations, disease cell populations and their drug response.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2015-08-29
    Description: Small RNAs, between 18nt and 30nt in length, are a diverse class of non-coding RNAs that mediate a range of cellular processes, from gene regulation to pathogen defense. They guide ribonucleoprotein complexes to their target nucleic acids by Watson–Crick base pairing. We report here that current techniques for small RNA detection and library generation are biased by formation of RNA duplexes. To address this problem, we established FDF-PAGE (fully-denaturing formaldehyde polyacrylamide gel electrophoresis) to prevent annealing of sRNAs to their complement. By applying FDF-PAGE, we provide evidence that both strands of viral small RNA are present in near equimolar ratios, indicating that the predominant precursor is a long double-stranded RNA. Comparing non-denaturing conditions to FDF-PAGE uncovered extensive sequestration of miRNAs in model organisms and allowed us to identify candidate small RNAs under the control of competing endogenous RNAs (ceRNAs). By revealing the full repertoire of small RNAs, we can begin to create a better understanding of small RNA mediated interactions.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2015-05-29
    Description: Identification of transcription units (TUs) encoded in a bacterial genome is essential to elucidation of transcriptional regulation of the organism. To gain a detailed understanding of the dynamically composed TU structures, we have used four strand-specific RNA-seq (ssRNA-seq) datasets collected under two experimental conditions to derive the genomic TU organization of Clostridium thermocellum using a machine-learning approach. Our method accurately predicted the genomic boundaries of individual TUs based on two sets of parameters measuring the RNA-seq expression patterns across the genome: expression-level continuity and variance. A total of 2590 distinct TUs are predicted based on the four RNA-seq datasets. Among the predicted TUs, 44% have multiple genes. We assessed our prediction method on an independent set of RNA-seq data with longer reads. The evaluation confirmed the high quality of the predicted TUs. Functional enrichment analyses on a selected subset of the predicted TUs revealed interesting biology. To demonstrate the generality of the prediction method, we have also applied the method to RNA-seq data collected on Escherichia coli and achieved high prediction accuracies. The TU prediction program named SeqTU is publicly available at https://code.google.com/p/seqtu/ . We expect that the predicted TUs can serve as the baseline information for studying transcriptional and post-transcriptional regulation in C. thermocellum and other bacteria.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2015-05-29
    Description: Detecting genetic variation is one of the main applications of high-throughput sequencing, but is still challenging wherever aligning short reads poses ambiguities. Current state-of-the-art variant calling approaches avoid such regions, arguing that it is necessary to sacrifice detection sensitivity to limit false discovery. We developed a method that links candidate variant positions within repetitive genomic regions into clusters. The technique relies on a resource, a thesaurus of genetic variation, that enumerates genomic regions with similar sequence. The resource is computationally intensive to generate, but once compiled can be applied efficiently to annotate and prioritize variants in repetitive regions. We show that thesaurus annotation can reduce the rate of false variant calls due to mappability by up to three orders of magnitude. We apply the technique to whole genome datasets and establish that called variants in low mappability regions annotated using the thesaurus can be experimentally validated. We then extend the analysis to a large panel of exomes to show that the annotation technique opens possibilities to study variation in hereto hidden and under-studied parts of the genome.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2016-07-28
    Description: Genetic engineering projects often require control over when a protein is degraded. To this end, we use a fusion between a degron and an inactivating peptide that can be added to the N-terminus of a protein. When the corresponding protease is expressed, it cleaves the peptide and the protein is degraded. Three protease:cleavage site pairs from Potyvirus are shown to be orthogonal and active in exposing degrons, releasing inhibitory domains and cleaving polyproteins. This toolbox is applied to the design of genetic circuits as a means to control regulator activity and degradation. First, we demonstrate that a gate can be constructed by constitutively expressing an inactivated repressor and having an input promoter drive the expression of the protease. It is also shown that the proteolytic release of an inhibitory domain can improve the dynamic range of a transcriptional gate (200-fold repression). Next, we design polyproteins containing multiple repressors and show that their cleavage can be used to control multiple outputs. Finally, we demonstrate that the dynamic range of an output can be improved (8-fold to 190-fold) with the addition of a protease-cleaved degron. Thus, controllable proteolysis offers a powerful tool for modulating and expanding the function of synthetic gene circuits.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2016-06-21
    Description: Assigning cancer patients to the most effective treatments requires an understanding of the molecular basis of their disease. While DNA-based molecular profiling approaches have flourished over the past several years to transform our understanding of driver pathways across a broad range of tumors, a systematic characterization of key driver pathways based on RNA data has not been undertaken. Here we introduce a new approach for predicting the status of driver cancer pathways based on signature functions derived from RNA sequencing data. To identify the driver cancer pathways of interest, we mined DNA variant data from TCGA and nominated driver alterations in seven major cancer pathways in breast, ovarian and colon cancer tumors. The activation status of these driver pathways were then characterized using RNA sequencing data by constructing classification signature functions in training datasets and then testing the accuracy of the signatures in test datasets. The signature functions differentiate well tumors with nominated pathway activation from tumors with no signs of activation: average AUC equals to 0.83. Our results confirm that driver genomic alterations are distinctively displayed at the transcriptional level and that the transcriptional signatures can generally provide an alternative to DNA sequencing methods in detecting specific driver pathways.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2016-06-21
    Description: Modeling the properties and functions of DNA sequences is an important, but challenging task in the broad field of genomics. This task is particularly difficult for non-coding DNA, the vast majority of which is still poorly understood in terms of function. A powerful predictive model for the function of non-coding DNA can have enormous benefit for both basic science and translational research because over 98% of the human genome is non-coding and 93% of disease-associated variants lie in these regions. To address this need, we propose DanQ, a novel hybrid convolutional and bi-directional long short-term memory recurrent neural network framework for predicting non-coding function de novo from sequence. In the DanQ model, the convolution layer captures regulatory motifs, while the recurrent layer captures long-term dependencies between the motifs in order to learn a regulatory ‘grammar’ to improve predictions. DanQ improves considerably upon other models across several metrics. For some regulatory markers, DanQ can achieve over a 50% relative improvement in the area under the precision-recall curve metric compared to related models. We have made the source code available at the github repository http://github.com/uci-cbcl/DanQ .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2016-06-21
    Description: Molecular sequences in public databases are mostly annotated by the submitting authors without further validation. This procedure can generate erroneous taxonomic sequence labels. Mislabeled sequences are hard to identify, and they can induce downstream errors because new sequences are typically annotated using existing ones. Furthermore, taxonomic mislabelings in reference sequence databases can bias metagenetic studies which rely on the taxonomy. Despite significant efforts to improve the quality of taxonomic annotations, the curation rate is low because of the labor-intensive manual curation process. Here, we present SATIVA, a phylogeny-aware method to automatically identify taxonomically mislabeled sequences (‘mislabels’) using statistical models of evolution. We use the Evolutionary Placement Algorithm (EPA) to detect and score sequences whose taxonomic annotation is not supported by the underlying phylogenetic signal, and automatically propose a corrected taxonomic classification for those. Using simulated data, we show that our method attains high accuracy for identification (96.9% sensitivity/91.7% precision) as well as correction (94.9% sensitivity/89.9% precision) of mislabels. Furthermore, an analysis of four widely used microbial 16S reference databases (Greengenes, LTP, RDP and SILVA) indicates that they currently contain between 0.2% and 2.5% mislabels. Finally, we use SATIVA to perform an in-depth evaluation of alternative taxonomies for Cyanobacteria. SATIVA is freely available at https://github.com/amkozlov/sativa .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2016-06-21
    Description: DNA microarrays and RNAseq are complementary methods for studying RNA molecules. Current computational methods to determine alternative exon usage (AEU) using such data require impractical visual inspection and still yield high false-positive rates. Integrated Gene and Exon Model of Splicing (iGEMS) adapts a gene-level residuals model with a gene size adjusted false discovery rate and exon-level analysis to circumvent these limitations. iGEMS was applied to two new DNA microarray datasets, including the high coverage Human Transcriptome Arrays 2.0 and performance was validated using RT-qPCR. First, AEU was studied in adipocytes treated with ( n = 9) or without ( n = 8) the anti-diabetes drug, rosiglitazone. iGEMS identified 555 genes with AEU, and robust verification by RT-qPCR (~90%). Second, in a three-way human tissue comparison (muscle, adipose and blood, n = 41) iGEMS identified 4421 genes with at least one AEU event, with excellent RT-qPCR verification (95%, n = 22). Importantly, iGEMS identified a variety of AEU events, including 3'UTR extension, as well as exon inclusion/exclusion impacting on protein kinase and extracellular matrix domains. In conclusion, iGEMS is a robust method for identification of AEU while the variety of exon usage between human tissues is 5–10 times more prevalent than reported by the Genotype-Tissue Expression consortium using RNA sequencing.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2016-05-06
    Description: The Cancer Genome Atlas (TCGA) research network has made public a large collection of clinical and molecular phenotypes of more than 10 000 tumor patients across 33 different tumor types. Using this cohort, TCGA has published over 20 marker papers detailing the genomic and epigenomic alterations associated with these tumor types. Although many important discoveries have been made by TCGA's research network, opportunities still exist to implement novel methods, thereby elucidating new biological pathways and diagnostic markers. However, mining the TCGA data presents several bioinformatics challenges, such as data retrieval and integration with clinical data and other molecular data types (e.g. RNA and DNA methylation). We developed an R/Bioconductor package called TCGAbiolinks to address these challenges and offer bioinformatics solutions by using a guided workflow to allow users to query, download and perform integrative analyses of TCGA data. We combined methods from computer science and statistics into the pipeline and incorporated methodologies developed in previous TCGA marker studies and in our own group. Using four different TCGA tumor types (Kidney, Brain, Breast and Colon) as examples, we provide case studies to illustrate examples of reproducibility, integrative analysis and utilization of different Bioconductor packages to advance and accelerate novel discoveries.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2016-05-06
    Description: Single cell RNA-seq experiments provide valuable insight into cellular heterogeneity but suffer from low coverage, 3' bias and technical noise. These unique properties of single cell RNA-seq data make study of alternative splicing difficult, and thus most single cell studies have restricted analysis of transcriptome variation to the gene level. To address these limitations, we developed SingleSplice, which uses a statistical model to detect genes whose isoform usage shows biological variation significantly exceeding technical noise in a population of single cells. Importantly, SingleSplice is tailored to the unique demands of single cell analysis, detecting isoform usage differences without attempting to infer expression levels for full-length transcripts. Using data from spike-in transcripts, we found that our approach detects variation in isoform usage among single cells with high sensitivity and specificity. We also applied SingleSplice to data from mouse embryonic stem cells and discovered a set of genes that show significant biological variation in isoform usage across the set of cells. A subset of these isoform differences are linked to cell cycle stage, suggesting a novel connection between alternative splicing and the cell cycle.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2016-07-09
    Description: The recent super-exponential growth in the amount of sequencing data generated worldwide has put techniques for compressed storage into the focus. Most available solutions, however, are strictly tied to specific bioinformatics formats, sometimes inheriting from them suboptimal design choices; this hinders flexible and effective data sharing. Here, we present CARGO (Compressed ARchiving for GenOmics), a high-level framework to automatically generate software systems optimized for the compressed storage of arbitrary types of large genomic data collections. Straightforward applications of our approach to FASTQ and SAM archives require a few lines of code, produce solutions that match and sometimes outperform specialized format-tailored compressors and scale well to multi-TB datasets. All CARGO software components can be freely downloaded for academic and non-commercial use from http://bio-cargo.sourceforge.net .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2016-07-09
    Description: Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al . demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2016-07-09
    Description: Many genomes display high levels of heterozygosity (i.e. presence of different alleles at the same loci in homologous chromosomes), being those of hybrid organisms an extreme such case. The assembly of highly heterozygous genomes from short sequencing reads is a challenging task because it is difficult to accurately recover the different haplotypes. When confronted with highly heterozygous genomes, the standard assembly process tends to collapse homozygous regions and reports heterozygous regions in alternative contigs. The boundaries between homozygous and heterozygous regions result in multiple assembly paths that are hard to resolve, which leads to highly fragmented assemblies with a total size larger than expected. This, in turn, causes numerous problems in downstream analyses such as fragmented gene models, wrong gene copy number, or broken synteny. To circumvent these caveats we have developed a pipeline that specifically deals with the assembly of heterozygous genomes by introducing a step to recognise and selectively remove alternative heterozygous contigs. We tested our pipeline on simulated and naturally-occurring heterozygous genomes and compared its accuracy to other existing tools. Our method is freely available at https://github.com/Gabaldonlab/redundans .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2013-09-26
    Description: Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2013-09-26
    Description: A renewed interest in non-coding RNA (ncRNA) has led to the discovery of novel RNA species and post-transcriptional ribonucleoside modifications, and an emerging appreciation for the role of ncRNA in RNA epigenetics. Although much can be learned by amplification-based analysis of ncRNA sequence and quantity, there is a significant need for direct analysis of RNA, which has led to numerous methods for purification of specific ncRNA molecules. However, no single method allows purification of the full range of cellular ncRNA species. To this end, we developed a multidimensional chromatographic platform to resolve, isolate and quantify all canonical ncRNAs in a single sample of cells or tissue, as well as novel ncRNA species. The applicability of the platform is demonstrated in analyses of ncRNA from bacteria, human cells and plasmodium-infected reticulocytes, as well as a viral RNA genome. Among the many potential applications of this platform are a system-level analysis of the dozens of modified ribonucleosides in ncRNA, characterization of novel long ncRNA species, enhanced detection of rare transcript variants and analysis of viral genomes.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2013-09-26
    Description: In reverse genetics, a gene’s function is elucidated through targeted modifications in the coding region or associated DNA cis -regulatory elements. To this purpose, recently developed customizable transcription activator-like effector nucleases (TALENs) have proven an invaluable tool, allowing introduction of double-strand breaks at predetermined sites in the genome. Here we describe a practical and efficient method for the targeted genome engineering in Drosophila . We demonstrate TALEN-mediated targeted gene integration and efficient identification of mutant flies using a traceable marker phenotype. Furthermore, we developed an easy TALEN assembly (easyT) method relying on simultaneous reactions of DNA Bae I digestion and ligation, enabling construction of complete TALENs from a monomer unit library in a single day. Taken together, our strategy with easyT and TALEN-plasmid microinjection simplifies mutant generation and enables isolation of desired mutant fly lines in the F 1 generation.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2013-06-08
    Description: Transcription activator-like effector nucleases (TALENs) are a powerful new approach for targeted gene disruption in various animal models, but little is known about their activities in Mus musculus, the widely used mammalian model organism. Here, we report that direct injection of in vitro transcribed messenger RNA of TALEN pairs into mouse zygotes induced somatic mutations, which were stably passed to the next generation through germ-line transmission. With one TALEN pair constructed for each of 10 target genes, mutant F0 mice for each gene were obtained with the mutation rate ranged from 13 to 67% and an average of ~40% of total healthy newborns with no significant differences between C57BL/6 and FVB/N genetic background. One TALEN pair with single mismatch to their intended target sequence in each side failed to yield any mutation. Furthermore, highly efficient germ-line transmission was obtained, as all the F0 founders tested transmitted the mutations to F1 mice. In addition, we also observed that one bi-allele mutant founder of Lepr gene, encoding Leptin receptor, had similar diabetic phenotype as db/db mouse. Together, our results suggest that TALENs are an effective genetic tool for rapid gene disruption with high efficiency and heritability in mouse with distinct genetic background.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2013-06-08
    Description: We describe a new cell-free protein synthesis (CFPS) method for site-specific incorporation of non-natural amino acids (nnAAs) into proteins in which the orthogonal tRNA (o-tRNA) and the modified protein (i.e. the protein containing the nnAA) are produced simultaneously. Using this method, 0.9–1.7 mg/ml of modified soluble super-folder green fluorescent protein (sfGFP) containing either p -azido- l -phenylalanine (pAzF) or p -propargyloxy- l -phenylalanine (pPaF) accumulated in the CFPS solutions; these yields correspond to 50–88% suppression efficiency. The o-tRNA can be transcribed either from a linearized plasmid or from a crude PCR product. Comparison of two different o-tRNAs suggests that the new platform is not limited by Ef-Tu recognition of the acylated o-tRNA at sufficiently high o-tRNA template concentrations. Analysis of nnAA incorporation across 12 different sites in sfGFP suggests that modified protein yields and suppression efficiencies (i.e. the position effect) do not correlate with any of the reported trends. Sites that were ineffectively suppressed with the original o-tRNA were better suppressed with an optimized o-tRNA (o-tRNA opt ) that was evolved to be better recognized by Ef-Tu. This new platform can also be used to screen scissile ribozymes for improved catalysis.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2013-06-08
    Description: The introduction of next generation sequencing methods in genome studies has made it possible to shift research from a gene-centric approach to a genome wide view. Although methods and tools to detect single nucleotide polymorphisms are becoming more mature, methods to identify and visualize structural variation (SV) are still in their infancy. Most genome browsers can only compare a given sequence to a reference genome; therefore, direct comparison of multiple individuals still remains a challenge. Therefore, the implementation of efficient approaches to explore and visualize SVs and directly compare two or more individuals is desirable. In this article, we present a visualization approach that uses space-filling Hilbert curves to explore SVs based on both read-depth and pair-end information. An interactive open-source Java application, called Meander , implements the proposed methodology, and its functionality is demonstrated using two cases. With Meander , users can explore variations at different levels of resolution and simultaneously compare up to four different individuals against a common reference. The application was developed using Java version 1.6 and Processing.org and can be run on any platform. It can be found at http://homes.esat.kuleuven.be/~bioiuser/meander .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2015-05-03
    Description: Inversion polymorphisms have important phenotypic and evolutionary consequences in humans. Two different methodologies have been used to infer inversions from SNP dense data, enabling the use of large cohorts for their study. One approach relies on the differences in linkage disequilibrium across breakpoints; the other one captures the internal haplotype groups that tag the inversion status of chromosomes. In this article, we assessed the convergence of the two methods in the detection of 20 human inversions that have been reported in the literature. The methods converged in four inversions including inv-8p23, for which we studied its association with low-BMI in American children. Using a novel haplotype tagging method with control on inversion ancestry, we computed the frequency of inv-8p23 in two American cohorts and observed inversion haplotype admixture. Accounting for haplotype ancestry, we found that the European inverted allele in children carries a recessive risk of underweight, validated in an independent Spanish cohort (combined: OR= 2.00, P = 0.001). While the footprints of inversions on SNP data are complex, we show that systematic analyses, such as convergence of different methods and controlling for ancestry, can reveal the contribution of inversions to the ancestral composition of populations and to the heritability of human disease.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2015-05-03
    Description: The Metabolic Models Reconstruction Using Genome-Scale Information ( merlin ) tool is a user-friendly Java application that aids the reconstruction of genome-scale metabolic models for any organism that has its genome sequenced. It performs the major steps of the reconstruction process, including the functional genomic annotation of the whole genome and subsequent construction of the portfolio of reactions. Moreover, merlin includes tools for the identification and annotation of genes encoding transport proteins, generating the transport reactions for those carriers. It also performs the compartmentalisation of the model, predicting the organelle localisation of the proteins encoded in the genome and thus the localisation of the metabolites involved in the reactions promoted by such enzymes. The gene-proteins-reactions (GPR) associations are automatically generated and included in the model. Finally, merlin expedites the transition from genomic data to draft metabolic models reconstructions exported in the SBML standard format, allowing the user to have a preliminary view of the biochemical network, which can be manually curated within the environment provided by merlin .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-05-03
    Description: For eukaryotic cells, the biological processes involving regulatory DNA elements play an important role in cell cycle. Understanding 3D spatial arrangements of chromosomes and revealing long-range chromatin interactions are critical to decipher these biological processes. In recent years, chromosome conformation capture (3C) related techniques have been developed to measure the interaction frequencies between long-range genome loci, which have provided a great opportunity to decode the 3D organization of the genome. In this paper, we develop a new Bayesian framework to derive the 3D architecture of a chromosome from 3C-based data. By modeling each chromosome as a polymer chain, we define the conformational energy based on our current knowledge on polymer physics and use it as prior information in the Bayesian framework. We also propose an expectation-maximization (EM) based algorithm to estimate the unknown parameters of the Bayesian model and infer an ensemble of chromatin structures based on interaction frequency data. We have validated our Bayesian inference approach through cross-validation and verified the computed chromatin conformations using the geometric constraints derived from fluorescence in situ hybridization (FISH) experiments. We have further confirmed the inferred chromatin structures using the known genetic interactions derived from other studies in the literature. Our test results have indicated that our Bayesian framework can compute an accurate ensemble of 3D chromatin conformations that best interpret the distance constraints derived from 3C-based data and also agree with other sources of geometric constraints derived from experimental evidence in the previous studies. The source code of our approach can be found in https://github.com/wangsy11/InfMod3DGen .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-05-03
    Description: Transformation-associated recombination (TAR) protocol allowing the selective isolation of full-length genes complete with their distal enhancer regions and entire genomic loci with sizes up to 250 kb from complex genomes in yeast S. cerevisiae has been developed more than a decade ago. However, its wide spread usage has been impeded by a low efficiency (0.5–2%) of chromosomal region capture during yeast transformants which in turn requires a time-consuming screen of hundreds of colonies. Here, we demonstrate that pre-treatment of genomic DNA with CRISPR-Cas9 nucleases to generate double-strand breaks near the targeted genomic region results in a dramatic increase in the fraction of gene-positive colonies (up to 32%). As only a dozen or less yeast transformants need to be screened to obtain a clone with the desired chromosomal region, extensive experience with yeast is no longer required. A TAR-CRISPR protocol may help to create a bank of human genes, each represented by a genomic copy containing its native regulatory elements, that would lead to a significant advance in functional, structural and comparative genomics, in diagnostics, gene replacement, generation of animal models for human diseases and has a potential for gene therapy.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2015-05-03
    Description: Characterization of cell type specific regulatory networks and elements is a major challenge in genomics, and emerging strategies frequently employ high-throughput genome-wide assays of transcription factor (TF) to DNA binding, histone modifications or chromatin state. However, these experiments remain too difficult/expensive for many laboratories to apply comprehensively to their system of interest. Here, we explore the potential of elucidating regulatory systems in varied cell types using computational techniques that rely on only data of gene expression, low-resolution chromatin accessibility, and TF–DNA binding specificities (‘motifs’). We show that static computational motif scans overlaid with chromatin accessibility data reasonably approximate experimentally measured TF–DNA binding. We demonstrate that predicted binding profiles and expression patterns of hundreds of TFs are sufficient to identify major regulators of ~200 spatiotemporal expression domains in the Drosophila embryo. We are then able to learn reliable statistical models of enhancer activity for over 70 expression domains and apply those models to annotate domain specific enhancers genome-wide. Throughout this work, we apply our motif and accessibility based approach to comprehensively characterize the regulatory network of fruitfly embryonic development and show that the accuracy of our computational method compares favorably to approaches that rely on data from many experimental assays.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2014-11-07
    Description: A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associated functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/ .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2015-04-21
    Description: RNA research and therapy relies primarily on synthetic RNAs. We employed recombinant RNA technology toward large-scale production of pre-miRNA agents in bacteria, but found the majority of target RNAs were not or negligibly expressed. We thus developed a novel strategy to achieve consistent high-yield biosynthesis of chimeric RNAs carrying various small RNAs (e.g. miRNAs, siRNAs and RNA aptamers), which was based upon an optimal noncoding RNA scaffold (OnRS) derived from tRNA fusion pre-miR-34a (tRNA/mir-34a). Multi-milligrams of chimeric RNAs (e.g. OnRS/miR-124, OnRS/GFP-siRNA, OnRS/Neg (scrambled RNA) and OnRS/MGA (malachite green aptamer)) were readily obtained from 1 l bacterial culture. Deep sequencing analyses revealed that mature miR-124 and target GFP-siRNA were selectively released from chimeric RNAs in human cells. Consequently, OnRS/miR-124 was active in suppressing miR-124 target gene expression and controlling cellular processes, and OnRS/GFP-siRNA was effective in knocking down GFP mRNA levels and fluorescent intensity in ES-2/GFP cells and GFP -transgenic mice. Furthermore, the OnRS/MGA sensor offered a specific strong fluorescence upon binding MG, which was utilized as label-free substrate to accurately determine serum RNase activities in pancreatic cancer patients. These results demonstrate that OnRS-based bioengineering is a common, robust and versatile strategy to assemble various types of small RNAs for broad applications.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2016-01-09
    Description: Proteins adhere to DNA at locations and with strengths that depend on the protein conformation, the underlying DNA sequence and the ionic content of the solution. A facile technique to probe the positions and strengths of protein-DNA binding would aid in understanding these important interactions. Here, we describe a ‘DNA pulley’ for position-resolved nano-mechanical measurements of protein-DNA interactions. A molecule of DNA is tethered by one end to a glass surface, and by the other end to a magnetic bead. The DNA is stretched horizontally by a magnet, and a nanoscale knife made of silicon nitride is manipulated to contact, bend and scan along the DNA. The mechanical profile of the DNA at the contact with the knife is probed via nanometer-precision optical tracking of the magnetic bead. This system enables detection of protein bumps on the DNA and localization of their binding sites. We study theoretically the technical requirements to detect mechanical heterogeneities in the DNA itself.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2016-01-09
    Description: Synthetic biology seeks to envision living cells as a matter of engineering. However, increasing evidence suggests that the genetic load imposed by the incorporation of synthetic devices in a living organism introduces a sort of unpredictability in the design process. As a result, individual part characterization is not enough to predict the behavior of designed circuits and thus, a costly trial-error process is eventually required. In this work, we provide a new theoretical framework for the predictive treatment of the genetic load. We mathematically and experimentally demonstrate that dependences among genes follow a quantitatively predictable behavior. Our theory predicts the observed reduction of the expression of a given synthetic gene when an extra genetic load is introduced in the circuit. The theory also explains that such dependence qualitatively differs when the extra load is added either by transcriptional or translational modifications. We finally show that the limitation of the cellular resources for gene expression leads to a mathematical formulation that converges to an expression analogous to the Ohm's law for electric circuits. Similitudes and divergences with this law are outlined. Our work provides a suitable framework with predictive character for the design process of complex genetic devices in synthetic biology.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2015-06-24
    Description: Much of the inter-individual variation in gene expression is triggered via perturbations of signaling networks by DNA variants. We present a novel probabilistic approach for identifying the particular pathways by which DNA variants perturb the signaling network. Our procedure, called PINE, relies on a systematic integration of established biological knowledge of signaling networks with data on transcriptional responses to various experimental conditions. Unlike previous approaches, PINE provides statistical aspects that are critical for prioritizing hypotheses for followup experiments. Using simulated data, we show that higher accuracy is attained with PINE than with existing methods. We used PINE to analyze transcriptional responses of immune dendritic cells to several pathogenic stimulations. PINE identified statistically significant genetic perturbations in the pathogen-sensing signaling network, suggesting previously uncharacterized regulatory mechanisms for functional DNA variants.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2015-06-24
    Description: RNA sequencing (RNA-Seq) is a powerful tool for analyzing the identity of cellular RNAs but is often limited by the amount of material available for analysis. In spite of extensive efforts employing existing protocols, we observed that it was not possible to obtain useful sequencing libraries from nuclear RNA derived from cultured human cells after crosslinking and immunoprecipitation (CLIP). Here, we report a method for obtaining strand-specific small RNA libraries for RNA sequencing that requires picograms of RNA. We employ an intramolecular circularization step that increases the efficiency of library preparation and avoids the need for intermolecular ligations of adaptor sequences. Other key features include random priming for full-length cDNA synthesis and gel-free library purification. Using our method, we generated CLIP-Seq libraries from nuclear RNA that had been UV-crosslinked and immunoprecipitated with anti-Argonaute 2 (Ago2) antibody. Computational protocols were developed to enable analysis of raw sequencing data and we observe substantial differences between recognition by Ago2 of RNA species in the nucleus relative to the cytoplasm. This RNA self-circularization approach to RNA sequencing (RC-Seq) allows data to be obtained using small amounts of input RNA that cannot be sequenced by standard methods.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2015-08-29
    Description: Variations in sample quality are frequently encountered in small RNA-sequencing experiments, and pose a major challenge in a differential expression analysis. Removal of high variation samples reduces noise, but at a cost of reducing power, thus limiting our ability to detect biologically meaningful changes. Similarly, retaining these samples in the analysis may not reveal any statistically significant changes due to the higher noise level. A compromise is to use all available data, but to down-weight the observations from more variable samples. We describe a statistical approach that facilitates this by modelling heterogeneity at both the sample and observational levels as part of the differential expression analysis. At the sample level this is achieved by fitting a log-linear variance model that includes common sample-specific or group-specific parameters that are shared between genes. The estimated sample variance factors are then converted to weights and combined with observational level weights obtained from the mean–variance relationship of the log-counts-per-million using ‘voom’. A comprehensive analysis involving both simulations and experimental RNA-sequencing data demonstrates that this strategy leads to a universally more powerful analysis and fewer false discoveries when compared to conventional approaches. This methodology has wide application and is implemented in the open-source ‘limma’ package.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2015-08-29
    Description: Most mammalian genes have mRNA variants due to alternative promoter usage, alternative splicing, and alternative cleavage and polyadenylation. Expression of alternative RNA isoforms has been found to be associated with tumorigenesis, proliferation and differentiation. Detection of condition-associated transcription variation requires association methods. Traditional association methods such as Pearson chi-square test and Fisher Exact test are single test methods and do not work on count data with replicates. Although the Cochran Mantel Haenszel (CMH) approach can handle replicated count data, our simulations showed that multiple CMH tests still had very low power. To identify condition-associated variation of transcription, we here proposed a ranking analysis of chi-squares (RAX2) for large-scale association analysis. RAX2 is a nonparametric method and has accurate and conservative estimation of FDR profile. Simulations demonstrated that RAX2 performs well in finding condition-associated transcription variants. We applied RAX2 to primary T-cell transcriptomic data and identified 1610 (16.3%) tags associated in transcription with immune stimulation at FDR 〈 0.05. Most of these tags also had differential expression. Analysis of two and three tags within genes revealed that under immune stimulation short RNA isoforms were preferably used.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2016-08-20
    Description: Light-regulated modules offer unprecedented new ways to control cellular behavior in precise spatial and temporal resolution. The availability of such tools may dramatically accelerate the progression of synthetic biology applications. Nonetheless, current optogenetic toolbox of prokaryotes has potential issues such as lack of rapid and switchable control, less portable, low dynamic expression and limited parts. To address these shortcomings, we have engineered a novel bidirectional promoter system for Escherichia coli that can be induced or repressed rapidly and reversibly using the blue light dependent DNA-binding protein EL222. We demonstrated that by modulating the dosage of light pulses or intensity we could control the level of gene expression precisely. We show that both light-inducible and repressible system can function in parallel with high spatial precision in a single cell and can be switched stably between ON- and OFF-states by repetitive pulses of blue light. In addition, the light-inducible and repressible expression kinetics were quantitatively analysed using a mathematical model. We further apply the system, for the first time, to optogenetically synchronize two receiver cells performing different logic behaviors over time using blue light as a molecular clock signal. Overall, our modular approach layers a transformative platform for next-generation light-controllable synthetic biology systems in prokaryotes.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2016-08-20
    Description: Current DNA assembly methods for preparing highly purified linear subassemblies require complex and time-consuming in vitro manipulations that hinder their ability to construct megabase-sized DNAs (e.g. synthetic genomes). We have developed a new method designated ‘CasHRA ( Cas 9-facilitated H omologous R ecombination A ssembly)’ that directly uses large circular DNAs in a one-step in vivo assembly process. The large circular DNAs are co-introduced into Saccharomyces cerevisiae by protoplast fusion, and they are cleaved by RNA-guided Cas9 nuclease to release the linear DNA segments for subsequent assembly by the endogenous homologous recombination system. The CasHRA method allows efficient assembly of multiple large DNA segments in vivo ; thus, this approach should be useful in the last stage of genome construction. As a proof of concept, we combined CasHRA with an upstream assembly method (Gibson procedure of genome assembly) and successfully constructed a 1.03 Mb MGE-syn1.0 ( M inimal G enome of Escherichia coli ) that contained 449 essential genes and 267 important growth genes. We expect that CasHRA will be widely used in megabase-sized genome constructions.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2015-10-15
    Description: Natural regulatory networks contain many interacting components that allow for fine-tuning of switching and memory properties. Building simple bistable switches, synthetic biologists have learned the design principles of complex natural regulatory networks. However, most switches constructed so far are so simple (e.g. comprising two regulators) that they are functional only within a limited parameter range. Here, we report the construction of robust, tunable bistable switches in Escherichia coli using three heterologous protein regulators (ExsADC) that are sequestered into an inactive complex through a partner swapping mechanism. On the basis of mathematical modeling, we accurately predict and experimentally verify that the hysteretic region can be fine-tuned by controlling the interactions of the ExsADC regulatory cascade using the third member ExsC as a tuning knob. Additionally, we confirm that a dual-positive feedback switch can markedly increase the hysteretic region, compared to its single-positive feedback counterpart. The dual-positive feedback switch displays bistability over a 10 6 -fold range of inducer concentrations, to our knowledge, the largest range reported so far. This work demonstrates the successful interlocking of sequestration-based ultrasensitivity and positive feedback, a design principle that can be applied to the construction of robust, tunable, and predictable genetic programs to achieve increasingly sophisticated biological behaviors.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2015-12-16
    Description: To understand how transposon landscapes (TLs) vary across animal genomes, we describe a new method called the Transposon Insertion and Depletion AnaLyzer (TIDAL) and a database of 〉300 TLs in Drosophila melanogaster (TIDAL-Fly). Our analysis reveals pervasive TL diversity across cell lines and fly strains, even for identically named sub-strains from different laboratories such as the ISO1 strain used for the reference genome sequence. On average, 〉500 novel insertions exist in every lab strain, inbred strains of the Drosophila Genetic Reference Panel (DGRP), and fly isolates in the Drosophila Genome Nexus (DGN). A minority (〈25%) of transposon families comprise the majority (〉70%) of TL diversity across fly strains. A sharp contrast between insertion and depletion patterns indicates that many transposons are unique to the ISO1 reference genome sequence. Although TL diversity from fly strains reaches asymptotic limits with increasing sequencing depth, rampant TL diversity causes unsaturated detection of TLs in pools of flies. Finally, we show novel transposon insertions negatively correlate with Piwi-interacting RNA (piRNA) levels for most transposon families, except for the highly-abundant roo retrotransposon. Our study provides a useful resource for Drosophila geneticists to understand how transposons create extensive genomic diversity in fly cell lines and strains.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2016-06-03
    Description: Understanding telomere length maintenance mechanisms is central in cancer biology as their dysregulation is one of the hallmarks for immortalization of cancer cells. Important for this well-balanced control is the transcriptional regulation of the telomerase genes. We integrated Mixed Integer Linear Programming models into a comparative machine learning based approach to identify regulatory interactions that best explain the discrepancy of telomerase transcript levels in yeast mutants with deleted regulators showing aberrant telomere length, when compared to mutants with normal telomere length. We uncover novel regulators of telomerase expression, several of which affect histone levels or modifications. In particular, our results point to the transcription factors Sum1, Hst1 and Srb2 as being important for the regulation of EST1 transcription, and we validated the effect of Sum1 experimentally. We compiled our machine learning method leading to a user friendly package for R which can straightforwardly be applied to similar problems integrating gene regulator binding information and expression profiles of samples of e.g. different phenotypes, diseases or treatments.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2016-06-03
    Description: We have investigated transcriptional interference between convergent genes in E. coli and demonstrate substantial interference for inter-promoter distances of as far as 3 kb. Interference can be elicited by both strong 70 dependent and T7 promoters. In the presented design, a strong promoter driving gene expression of a ‘forward’ gene interferes with the expression of a ‘reverse’ gene by a weak promoter. This arrangement allows inversely correlated gene expression without requiring further regulatory components. Thus, modulation of the activity of the strong promoter alters expression of both the forward and the reverse gene. We used this design to develop a dual selection system for conditional operator site binding, allowing positive selection both for binding and for non-binding to DNA. This study demonstrates the utility of this novel system using the Lac repressor as a model protein for conditional DNA binding, and spectinomycin and chloramphenicol resistance genes as positive selection markers in liquid culture. Randomized LacI libraries were created and subjected to subsequent dual selection, but mispairing IPTG and selection cues in respect to the wild-type LacI response, allowing the isolation of a LacI variant with a reversed IPTG response within three rounds of library generation and dual selection.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2016-06-03
    Description: The ability to integrate ‘omics’ (i.e. transcriptomics and proteomics) is becoming increasingly important to the understanding of regulatory mechanisms. There are currently no tools available to identify differentially expressed genes (DEGs) across different ‘omics’ data types or multi-dimensional data including time courses. We present fCI (f-divergence Cut-out Index), a model capable of simultaneously identifying DEGs from continuous and discrete transcriptomic, proteomic and integrated proteogenomic data. We show that fCI can be used across multiple diverse sets of data and can unambiguously find genes that show functional modulation, developmental changes or misregulation. Applying fCI to several proteogenomics datasets, we identified a number of important genes that showed distinctive regulation patterns. The package fCI is available at R Bioconductor and http://software.steenlab.org/fCI/ .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2016-06-03
    Description: Next generation sequencing of cellular RNA is making it possible to characterize genes and alternative splicing in unprecedented detail. However, designing bioinformatics tools to accurately capture splicing variation has proven difficult. Current programs can find major isoforms of a gene but miss lower abundance variants, or are sensitive but imprecise. CLASS2 is a novel open source tool for accurate genome-guided transcriptome assembly from RNA-seq reads based on the model of splice graph. An extension of our program CLASS, CLASS2 jointly optimizes read patterns and the number of supporting reads to score and prioritize transcripts, implemented in a novel, scalable and efficient dynamic programming algorithm. When compared against reference programs, CLASS2 had the best overall accuracy and could detect up to twice as many splicing events with precision similar to the best reference program. Notably, it was the only tool to produce consistently reliable transcript models for a wide range of applications and sequencing strategies, including ribosomal RNA-depleted samples. Lightweight and multi-threaded, CLASS2 requires 〈3GB RAM and can analyze a 350 million read set within hours, and can be widely applied to transcriptomics studies ranging from clinical RNA sequencing, to alternative splicing analyses, and to the annotation of new genomes.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2016-06-03
    Description: N 6 -methyladenosine (m 6 A) is a prevalent RNA methylation modification involved in the regulation of degradation, subcellular localization, splicing and local conformation changes of RNA transcripts. High-throughput experiments have demonstrated that only a small fraction of the m 6 A consensus motifs in mammalian transcriptomes are modified. Therefore, accurate identification of RNA m 6 A sites becomes emergently important. For the above purpose, here a computational predictor of mammalian m 6 A site named SRAMP is established. To depict the sequence context around m 6 A sites, SRAMP combines three random forest classifiers that exploit the positional nucleotide sequence pattern, the K-nearest neighbor information and the position-independent nucleotide pair spectrum features, respectively. SRAMP uses either genomic sequences or cDNA sequences as its input. With either kind of input sequence, SRAMP achieves competitive performance in both cross-validation tests and rigorous independent benchmarking tests. Analyses of the informative features and overrepresented rules extracted from the random forest classifiers demonstrate that nucleotide usage preferences at the distal positions, in addition to those at the proximal positions, contribute to the classification. As a public prediction server, SRAMP is freely available at http://www.cuilab.cn/sramp/ .
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2016-09-20
    Description: Allele-specific copy number analysis (ASCN) from next generation sequencing (NGS) data can greatly extend the utility of NGS beyond the identification of mutations to precisely annotate the genome for the detection of homozygous/heterozygous deletions, copy-neutral loss-of-heterozygosity (LOH), allele-specific gains/amplifications. In addition, as targeted gene panels are increasingly used in clinical sequencing studies for the detection of ‘actionable’ mutations and copy number alterations to guide treatment decisions, accurate, tumor purity-, ploidy- and clonal heterogeneity-adjusted integer copy number calls are greatly needed to more reliably interpret NGS-based cancer gene copy number data in the context of clinical sequencing. We developed FACETS, an ASCN tool and open-source software with a broad application to whole genome, whole-exome, as well as targeted panel sequencing platforms. It is a fully integrated stand-alone pipeline that includes sequencing BAM file post-processing, joint segmentation of total- and allele-specific read counts, and integer copy number calls corrected for tumor purity, ploidy and clonal heterogeneity, with comprehensive output and integrated visualization. We demonstrate the application of FACETS using The Cancer Genome Atlas (TCGA) whole-exome sequencing of lung adenocarcinoma samples. We also demonstrate its application to a clinical sequencing platform based on a targeted gene panel.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2016-09-03
    Description: We present SWAN, a statistical framework for robust detection of genomic structural variants in next-generation sequencing data and an analysis of mid-range size insertion and deletions (〈10 Kb) for whole genome analysis and DNA mixtures. To identify these mid-range size events, SWAN collectively uses information from read-pair, read-depth and one end mapped reads through statistical likelihoods based on Poisson field models. SWAN also uses soft-clip/split read remapping to supplement the likelihood analysis and determine variant boundaries. The accuracy of SWAN is demonstrated by in silico spike-ins and by identification of known variants in the NA12878 genome. We used SWAN to identify a series of novel set of mid-range insertion/deletion detection that were confirmed by targeted deep re-sequencing. An R package implementation of SWAN is open source and freely available.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2016-08-20
    Description: High-throughput screening (HTS) is an indispensable tool for drug (target) discovery that currently lacks user-friendly software tools for the robust identification of putative hits from HTS experiments and for the interpretation of these findings in the context of systems biology. We developed HiTSeekR as a one-stop solution for chemical compound screens, siRNA knock-down and CRISPR/Cas9 knock-out screens, as well as microRNA inhibitor and -mimics screens. We chose three use cases that demonstrate the potential of HiTSeekR to fully exploit HTS screening data in quite heterogeneous contexts to generate novel hypotheses for follow-up experiments: (i) a genome-wide RNAi screen to uncover modulators of TNFα, (ii) a combined siRNA and miRNA mimics screen on vorinostat resistance and (iii) a small compound screen on KRAS synthetic lethality. HiTSeekR is publicly available at http://hitseekr.compbio.sdu.dk . It is the first approach to close the gap between raw data processing, network enrichment and wet lab target generation for various HTS screen types.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2015-04-21
    Description: Next-generation sequencing (NGS) approaches rapidly produce millions to billions of short reads, which allow pathogen detection and discovery in human clinical, animal and environmental samples. A major limitation of sequence homology-based identification for highly divergent microorganisms is the short length of reads generated by most highly parallel sequencing technologies. Short reads require a high level of sequence similarities to annotated genes to confidently predict gene function or homology. Such recognition of highly divergent homologues can be improved by reference-free ( de novo ) assembly of short overlapping sequence reads into larger contigs. We describe an ensemble strategy that integrates the sequential use of various de Bruijn graph and overlap-layout-consensus assemblers with a novel partitioned sub-assembly approach. We also proposed new quality metrics that are suitable for evaluating metagenome de novo assembly. We demonstrate that this new ensemble strategy tested using in silico spike-in, clinical and environmental NGS datasets achieved significantly better contigs than current approaches.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2015-04-21
    Description: Distinguishing between promoter-like sequences in bacteria that belong to true or abortive promoters, or to those that do not initiate transcription at all, is one of the important challenges in transcriptomics. To address this problem, we have studied the genome-reduced bacterium Mycoplasma pneumoniae , for which the RNAs associated with transcriptional start sites have been recently experimentally identified. We determined the contribution to transcription events of different genomic features: the –10, extended –10 and –35 boxes, the UP element, the bases surrounding the –10 box and the nearest-neighbor free energy of the promoter region. Using a random forest classifier and the aforementioned features transformed into scores, we could distinguish between true, abortive promoters and non-promoters with good –10 box sequences. The methods used in this characterization of promoters can be extended to other bacteria and have important applications for promoter design in bacterial genome engineering.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2015-04-21
    Description: We describe solid-phase cloning (SPC) for high-throughput assembly of expression plasmids. Our method allows PCR products to be put directly into a liquid handler for capture and purification using paramagnetic streptavidin beads and conversion into constructs by subsequent cloning reactions. We present a robust automated protocol for restriction enzyme based SPC and its performance for the cloning of 〉60 000 unique human gene fragments into expression vectors. In addition, we report on SPC-based single-strand assembly for applications where exact control of the sequence between fragments is needed or where multiple inserts are to be assembled. In this approach, the solid support allows for head-to-tail assembly of DNA fragments based on hybridization and polymerase fill-in. The usefulness of head-to-tail SPC was demonstrated by assembly of 〉150 constructs with up to four DNA parts at an average success rate above 80%. We report on several applications for SPC and we suggest it to be particularly suitable for high-throughput efforts using laboratory workstations.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2015-04-21
    Description: MicroRNAs (miRNAs) are involved in the regulation of gene expression at a post-transcriptional level. As such, monitoring miRNA expression has been increasingly used to assess their role in regulatory mechanisms of biological processes. In large scale studies, once miRNAs of interest have been identified, the target genes they regulate are often inferred using algorithms or databases. A pathway analysis is then often performed in order to generate hypotheses about the relevant biological functions controlled by the miRNA signature. Here we show that the method widely used in scientific literature to identify these pathways is biased and leads to inaccurate results. In addition to describing the bias and its origin we present an alternative strategy to identify potential biological functions specifically impacted by a miRNA signature. More generally, our study exemplifies the crucial need of relevant negative controls when developing, and using, bioinformatics methods.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2015-06-24
    Description: The cyanobacterial hsp17 ribonucleicacid thermometer (RNAT) is one of the smallest naturally occurring RNAT. It forms a single hairpin with an internal 1 x 3-bulge separating the start codon in stem I from the ribosome binding site (RBS) in stem II. We investigated the temperature-dependent regulation of hsp17 by mapping individual base-pair stabilities from solvent exchange nuclear magnetic resonance (NMR) spectroscopy. The wild-type RNAT was found to be stabilized by two critical CG base pairs (C14-G27 and C13-G28). Replacing the internal 1 x 3 bulge by a stable CG base pair in hsp17 rep significantly increased the global stability and unfolding cooperativity as evidenced by circular dichroism spectroscopy. From the NMR analysis, remote stabilization and non-nearest neighbour effects exist at the base-pair level, in particular for nucleotide G28 (five nucleotides apart from the side of mutation). Individual base-pair stabilities are coupled to the stability of the entire thermometer within both the natural and the stabilized RNATs by enthalpy–entropy compensation presumably mediated by the hydration shell. At the melting point the Gibbs energies of the individual nucleobases are equalized suggesting a consecutive zipper-type unfolding mechanism of the RBS leading to a dimmer-like function of hsp17 and switch-like regulation behaviour of hsp17 rep . The data show how minor changes in the nucleotide sequence not only offset the melting temperature but also alter the mode of temperature sensing. The cyanobacterial thermosensor demonstrates the remarkable adjustment of natural RNATs to execute precise temperature control.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2014-11-28
    Description: It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: https://github.com/jtleek/svaseq .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2014-11-28
    Description: High-throughput techniques have considerably increased the potential of comparative genomics whilst simultaneously posing many new challenges. One of those challenges involves efficiently mining the large amount of data produced and exploring the landscape of both conserved and idiosyncratic genomic regions across multiple genomes. Domains of application of these analyses are diverse: identification of evolutionary events, inference of gene functions, detection of niche-specific genes or phylogenetic profiling. Insyght is a comparative genomic visualization tool that combines three complementary displays: (i) a table for thoroughly browsing amongst homologues, (ii) a comparator of orthologue functional annotations and (iii) a genomic organization view designed to improve the legibility of rearrangements and distinctive loci. The latter display combines symbolic and proportional graphical paradigms. Synchronized navigation across multiple species and interoperability between the views are core features of Insyght. A gene filter mechanism is provided that helps the user to build a biologically relevant gene set according to multiple criteria such as presence/absence of homologues and/or various annotations. We illustrate the use of Insyght with scenarios. Currently, only Bacteria and Archaea are supported. A public instance is available at http://genome.jouy.inra.fr/Insyght . The tool is freely downloadable for private data set analysis.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2014-11-28
    Description: The 54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the 54 promoters. Here, a predictor called ‘ iPro54-PseKNC ’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k -tuple nucleotide composition’, which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC . For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the 54 promoters.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2014-11-28
    Description: We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2013-02-20
    Description: While it has been long recognized that genes are not randomly positioned along the genome, the degree to which its 3D structure influences the arrangement of genes has remained elusive. In particular, several lines of evidence suggest that actively transcribed genes are spatially co-localized, forming transcription factories; however, a generalized systematic test has hitherto not been described. Here we reveal transcription factories using a rigorous definition of genomic structure based on Saccharomyces cerevisiae chromosome conformation capture data, coupled with an experimental design controlling for the primary gene order. We develop a data-driven method for the interpolation and the embedding of such datasets and introduce statistics that enable the comparison of the spatial and genomic densities of genes. Combining these, we report evidence that co-regulated genes are clustered in space, beyond their observed clustering in the context of gene order along the genome and show this phenomenon is significant for 64 out of 117 transcription factors. Furthermore, we show that those transcription factors with high spatially co-localized targets are expressed higher than those whose targets are not spatially clustered. Collectively, our results support the notion that, at a given time, the physical density of genes is intimately related to regulatory activity.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2013-02-20
    Description: Diverse life forms are driven by the evolution of gene regulatory programs including changes in regulator proteins and cis -regulatory elements. Alterations of cis -regulatory elements are likely to dominate the evolution of the gene regulatory networks, as they are subjected to smaller selective constraints compared with proteins and hence may evolve quickly to adapt the environment. Prior studies on cis -regulatory element evolution focus primarily on sequence substitutions of known transcription factor-binding motifs. However, evolutionary models for the dynamics of motif occurrence are relatively rare, and comprehensive characterization of the evolution of all possible motif sequences has not been pursued. In the present study, we propose an algorithm to estimate the strength of purifying selection of a motif sequence based on an evolutionary model capturing the birth and death of motif occurrences on promoters. We term this measure as the ‘evolutionary retention coefficient’, as it is related yet distinct from the canonical definition of selection coefficient in population genetics. Using this algorithm, we estimate and report the evolutionary retention coefficients of all possible 10-nucleotide sequences from the aligned promoter sequences of 27 748. orthologous gene families in 34 mammalian species. Intriguingly, the evolutionary retention coefficients of motifs are intimately associated with their functional relevance. Top-ranking motifs (sorted by evolutionary retention coefficients) are significantly enriched with transcription factor-binding sequences according to the curated knowledge from the TRANSFAC database and the ChIP-seq data generated from the ENCODE Consortium. Moreover, genes harbouring high-scoring motifs on their promoters retain significantly coherent expression profiles, and those genes are over-represented in the functional classes involved in gene regulation. The validation results reveal the dependencies between natural selection and functions of cis -regulatory elements and shed light on the evolution of gene regulatory networks.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2012-12-14
    Description: Pan-genome ortholog clustering tool ( PanOCT ) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ~70% of the clusters and ~86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2012-09-27
    Description: Genome-scale engineering of living organisms requires precise and economical methods to efficiently modify many loci within chromosomes. One such example is the directed integration of chemically synthesized single-stranded deoxyribonucleic acid (oligonucleotides) into the chromosome of Escherichia coli during replication. Herein, we present a general co-selection strategy in multiplex genome engineering that yields highly modified cells. We demonstrate that disparate sites throughout the genome can be easily modified simultaneously by leveraging selectable markers within 500 kb of the target sites. We apply this technique to the modification of 80 sites in the E. coli genome.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2012-10-10
    Description: A novel ab initio parameter-tuning-free system to identify transcriptional factor (TF) binding motifs (TFBMs) in genome DNA sequences was developed. It is based on the comparison of two types of frequency distributions with respect to the TFBM candidates in the target DNA sequences and the non-candidates in the background sequence, with the latter generated by utilizing the intergenic sequences. For benchmark tests, we used DNA sequence datasets extracted by ChIP-on-chip and ChIP-seq techniques and identified 65 yeast and four mammalian TFBMs, with the latter including gaps. The accuracy of our system was compared with those of other available programs (i.e. MEME, Weeder, BioProspector, MDscan and DME) and was the best among them, even without tuning of the parameter set for each TFBM and pre-treatment/editing of the target DNA sequences. Moreover, with respect to some TFs for which the identified motifs are inconsistent with those in the references, our results were revealed to be correct, by comparing them with other existing experimental data. Thus, our identification system does not need any other biological information except for gene positions, and is also expected to be applicable to genome DNA sequences to identify unknown TFBMs as well as known ones.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2012-10-10
    Description: Several bioinformatics methods have been proposed for the detection and characterization of genomic structural variation (SV) from ultra high-throughput genome resequencing data. Recent surveys show that comprehensive detection of SV events of different types between an individual resequenced genome and a reference sequence is best achieved through the combination of methods based on different principles (split mapping, reassembly, read depth, insert size, etc.). The improvement of individual predictors is thus an important objective. In this study, we propose a new method that combines deviations from expected library insert sizes and additional information from local patterns of read mapping and uses supervised learning to predict the position and nature of structural variants. We show that our approach provides greatly increased sensitivity with respect to other tools based on paired end read mapping at no cost in specificity, and it makes reliable predictions of very short insertions and deletions in repetitive and low-complexity genomic contexts that can confound tools based on split mapping of reads.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2012-10-10
    Description: MicroRNAs (miRNAs) are major regulators of gene expression in multicellular organisms. They recognize their targets by sequence complementarity and guide them to cleavage or translational arrest. It is generally accepted that plant miRNAs have extensive complementarity to their targets and their prediction usually relies on the use of empirical parameters deduced from known miRNA–target interactions. Here, we developed a strategy to identify miRNA targets which is mainly based on the conservation of the potential regulation in different species. We applied the approach to expressed sequence tags datasets from angiosperms. Using this strategy, we predicted many new interactions and experimentally validated previously unknown miRNA targets in Arabidopsis thaliana . Newly identified targets that are broadly conserved include auxin regulators, transcription factors and transporters. Some of them might participate in the same pathways as the targets known before, suggesting that some miRNAs might control different aspects of a biological process. Furthermore, this approach can be used to identify targets present in a specific group of species, and, as a proof of principle, we analyzed Solanaceae -specific targets. The presented strategy can be used alone or in combination with other approaches to find miRNA targets in plants.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2012-10-24
    Description: Mirtrons are a recently described category of microRNA (miRNA) relying on splicing rather than processing by the microprocessor complex to generate pre-miRNA precursors of the RNA interference (RNAi) pathway. Their discovery and subsequent verification provides important information about a distinct class of miRNA and inherent advantages that could be exploited to silence genes of interest. These include micro-processor-independent biogenesis, pol-II-dependent transcription, accurate species generation and the delivery of multiple artificial mirtrons as introns within a single host transcript. Here we determined the sequence motifs required for correct processing of the mmu-miR-1224 mirtron and incorporated these into artificial mirtrons targeting Parkinson’s disease-associated LRRK2 and α-synuclein genes. By incorporating these rules associated with processing and splicing, artificial mirtrons could be designed and made to silence complementary targets either at the mRNA or protein level. We further demonstrate with a LRRK2 targeting artificial mirtron that neuronal-specific silencing can be directed under the control of the human synapsin promoter. Finally, multiple mirtrons were co-delivered within a single host transcript, an eGFP reporter, to allow simultaneous targeting of two or more targets in a combinatorial approach. Thus, the unique characteristics of artificial mirtrons make this an attractive approach for future RNAi applications.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2012-04-15
    Description: Exome sequencing strategy is promising for finding novel mutations of human monogenic disorders. However, pinpointing the casual mutation in a small number of samples is still a big challenge. Here, we propose a three-level filtration and prioritization framework to identify the casual mutation(s) in exome sequencing studies. This efficient and comprehensive framework successfully narrowed down whole exome variants to very small numbers of candidate variants in the proof-of-concept examples. The proposed framework, implemented in a user-friendly software package, named KGGSeq ( http://statgenpro.psychiatry.hku.hk/kggseq ), will play a very useful role in exome sequencing-based discovery of human Mendelian disease genes.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2012-04-15
    Description: We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2012-04-15
    Description: MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/ .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2015-09-30
    Description: In cancer research, background models for mutation rates have been extensively calibrated in coding regions, leading to the identification of many driver genes, recurrently mutated more than expected. Noncoding regions are also associated with disease; however, background models for them have not been investigated in as much detail. This is partially due to limited noncoding functional annotation. Also, great mutation heterogeneity and potential correlations between neighboring sites give rise to substantial overdispersion in mutation count, resulting in problematic background rate estimation. Here, we address these issues with a new computational framework called LARVA. It integrates variants with a comprehensive set of noncoding functional elements, modeling the mutation counts of the elements with a β-binomial distribution to handle overdispersion. LARVA, moreover, uses regional genomic features such as replication timing to better estimate local mutation rates and mutational hotspots. We demonstrate LARVA's effectiveness on 760 whole-genome tumor sequences, showing that it identifies well-known noncoding drivers, such as mutations in the TERT promoter. Furthermore, LARVA highlights several novel highly mutated regulatory sites that could potentially be noncoding drivers. We make LARVA available as a software tool and release our highly mutated annotations as an online resource ( larva.gersteinlab.org ).
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2016-01-30
    Description: Disease-gene identification is a challenging process that has multiple applications within functional genomics and personalized medicine. Typically, this process involves both finding genes known to be associated with the disease (through literature search) and carrying out preliminary experiments or screens (e.g. linkage or association studies, copy number analyses, expression profiling) to determine a set of promising candidates for experimental validation. This requires extensive time and monetary resources. We describe Beegle , an online search and discovery engine that attempts to simplify this process by automating the typical approaches. It starts by mining the literature to quickly extract a set of genes known to be linked with a given query, then it integrates the learning methodology of Endeavour (a gene prioritization tool) to train a genomic model and rank a set of candidate genes to generate novel hypotheses. In a realistic evaluation setup, Beegle has an average recall of 84% in the top 100 returned genes as a search engine, which improves the discovery engine by 12.6% in the top 5% prioritized genes. Beegle is publicly available at http://beegle.esat.kuleuven.be/ .
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2016-01-30
    Description: Alternative splicing is an important mechanism in eukaryotes that expands the transcriptome and proteome significantly. It plays an important role in a number of biological processes. Understanding its regulation is hence an important challenge. Recently, increasing evidence has been collected that supports an involvement of intragenic DNA methylation in the regulation of alternative splicing. The exact mechanisms of regulation, however, are largely unknown, and speculated to be complex: different methylation profiles might exist, each of which could be associated with a different regulation mechanism. We present a computational technique that is able to determine such stable methylation patterns and allows to correlate these patterns with inclusion propensity of exons. Pattern detection is based on dynamic time warping (DTW) of methylation profiles, a sophisticated similarity measure for signals that can be non-trivially transformed. We design a flexible self-organizing map approach to pattern grouping. Exemplary application on available data sets indicates that stable patterns which correlate non-trivially with exon inclusion do indeed exist. To improve the reliability of these predictions, further studies on larger data sets will be required. We have thus taken great care that our software runs efficiently on modern hardware, so that it can support future studies on large-scale data sets.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2016-03-01
    Description: Microfluidics may revolutionize our ability to write synthetic DNA by addressing several fundamental limitations associated with generating novel genetic constructs. Here we report the first de novo synthesis and cell-free cloning of custom DNA libraries in sub-microliter reaction droplets using programmable digital microfluidics. Specifically, we developed Programmable Order Polymerization (POP), Microfluidic Combinatorial Assembly of DNA (M-CAD) and Microfluidic In-vitro Cloning (MIC) and applied them to de novo synthesis, combinatorial assembly and cell-free cloning of genes, respectively. Proof-of-concept for these methods was demonstrated by programming an autonomous microfluidic system to construct and clone libraries of yeast ribosome binding sites and bacterial Azurine, which were then retrieved in individual droplets and validated. The ability to rapidly and robustly generate designer DNA molecules in an autonomous manner should have wide application in biological research and development.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2016-03-01
    Description: Recent developments in synthetic biology have positioned lactic acid bacteria (LAB) as a major class of cellular chassis for applications. To achieve the full potential of LAB, one fundamental prerequisite is the capacity for rapid engineering of complex gene networks, such as natural biosynthetic pathways and multicomponent synthetic circuits, into which cellular functions are encoded. Here, we present a synthetic biology platform for rapid construction and optimization of large-scale gene networks in LAB. The platform involves a copy-controlled shuttle for hosting target networks and two associated strategies that enable efficient genetic editing and phenotypic validation. By using a nisin biosynthesis pathway and its variants as examples, we demonstrated multiplex, continuous editing of small DNA parts, such as ribosome-binding sites, as well as efficient manipulation of large building blocks such as genes and operons. To showcase the platform, we applied it to expand the phenotypic diversity of the nisin pathway by quickly generating a library of 63 pathway variants. We further demonstrated its utility by altering the regulatory topology of the nisin pathway for constitutive bacteriocin biosynthesis. This work demonstrates the feasibility of rapid and advanced engineering of gene networks in LAB, fostering their applications in biomedicine and other areas.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2016-03-01
    Description: Tumors are characterized by properties of genetic instability, heterogeneity, and significant oligoclonality. Elucidating this intratumoral heterogeneity is challenging but important. In this study, we propose a framework, BubbleTree, to characterize the tumor clonality using next generation sequencing (NGS) data. BubbleTree simultaneously elucidates the complexity of a tumor biopsy, estimating cancerous cell purity, tumor ploidy, allele-specific copy number, and clonality and represents this in an intuitive graph. We further developed a three-step heuristic method to automate the interpretation of the BubbleTree graph, using a divide-and-conquer strategy. In this study, we demonstrated the performance of BubbleTree with comparisons to similar commonly used tools such as THetA2, ABSOLUTE, AbsCN-seq and ASCAT, using both simulated and patient-derived data. BubbleTree outperformed these tools, particularly in identifying tumor subclonal populations and polyploidy. We further demonstrated BubbleTree's utility in tracking clonality changes from patients’ primary to metastatic tumor and dating somatic single nucleotide and copy number variants along the tumor clonal evolution. Overall, the BubbleTree graph and corresponding model is a powerful approach to provide a comprehensive spectrum of the heterogeneous tumor karyotype in human tumors. BubbleTree is R-based and freely available to the research community ( https://www.bioconductor.org/packages/release/bioc/html/BubbleTree.html ).
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2015-12-02
    Description: Alu insertions have contributed to 〉11% of the human genome and ~30–35 Alu subfamilies remain actively mobile, yet the characterization of polymorphic Alu insertions from short-read data remains a challenge. We build on existing computational methods to combine Alu detection and de novo assembly of WGS data as a means to reconstruct the full sequence of insertion events from Illumina paired end reads. Comparison with published calls obtained using PacBio long-reads indicates a false discovery rate below 5%, at the cost of reduced sensitivity due to the colocation of reference and non-reference repeats. We generate a highly accurate call set of 1614 completely assembled Alu variants from 53 samples from the Human Genome Diversity Project (HGDP) panel. We utilize the reconstructed alternative insertion haplotypes to genotype 1010 fully assembled insertions, obtaining 〉99% agreement with genotypes obtained by PCR. In our assembled sequences, we find evidence of premature insertion mechanisms and observe 5' truncation in 16% of Alu Ya5 and Alu Yb8 insertions. The sites of truncation coincide with stem-loop structures and SRP9/14 binding sites in the Alu RNA, implicating L1 ORF2p pausing in the generation of 5' truncations. Additionally, we identified variable Alu J and Alu S elements that likely arose due to non-retrotransposition mechanisms.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2015-12-02
    Description: Optimizing bio-production involves strain and process improvements performed as discrete steps. However, environment impacts genotype and a strain that is optimal under one set of conditions may not be under different conditions. We present a methodology to simultaneously vary genetic and process factors, so that both can be guided by design of experiments (DOE). Advances in DNA assembly and gene insulation facilitate this approach by accelerating multi-gene pathway construction and the statistical interpretation of screening data. This is applied to a 6-aminocaproic acid (6-ACA) pathway in Escherichia coli consisting of six heterologous enzymes. A 32-member fraction factorial library is designed that simultaneously perturbs expression and media composition. This is compared to a 64-member full factorial library just varying expression (0.64 Mb of DNA assembly). Statistical analysis of the screening data from these libraries leads to different predictions as to whether the expression of enzymes needs to increase or decrease. Therefore, if genotype and media were varied separately this would lead to a suboptimal combination. This is applied to the design of a strain and media composition that increases 6-ACA from 9 to 48 mg/l in a single optimization step. This work introduces a generalizable platform to co-optimize genetic and non-genetic factors.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2012-07-22
    Description: Cytosines in genomic DNA are sometimes methylated. This affects many biological processes and diseases. The standard way of measuring methylation is to use bisulfite, which converts unmethylated cytosines to thymines, then sequence the DNA and compare it to a reference genome sequence. We describe a method for the critical step of aligning the DNA reads to the correct genomic locations. Our method builds on classic alignment techniques, including likelihood-ratio scores and spaced seeds. In a realistic benchmark, our method has a better combination of sensitivity, specificity and speed than nine other high-throughput bisulfite aligners. This study enables more accurate and rational analysis of DNA methylation. It also illustrates how to adapt general-purpose alignment methods to a special case with distorted base patterns: this should be informative for other special cases such as ancient DNA and AT-rich genomes.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2012-07-22
    Description: Despite the many advantages of Caenorhabditis elegans , biochemical approaches to study tissue-specific gene expression in post-embryonic stages are challenging. Here, we report a novel experimental approach for efficient determination of tissue-specific transcriptomes involving the rapid release and purification of nuclei from major tissues of post-embryonic animals by f luorescence- a ctivated n uclei s orting (FANS), followed by deep sequencing of linearly amplified 3'-end regions of transcripts (3'-end-seq). We employed these approaches to compile the transcriptome of the developed C. elegans intestine and used this to analyse tissue-specific cleavage and polyadenylation. In agreement with intestinal-specific gene expression, highly expressed genes have enriched GATA-elements in their promoter regions and their functional properties are associated with processes that are characteristic for the intestine. We systematically mapped pre-mRNA cleavage and polyadenylation sites, or polyA sites, including more than 3000 sites that have previously not been identified. The detailed analysis of the 3'-ends of the nuclear mRNA revealed widespread alternative polyA site use (APA) in intestinally expressed genes. Importantly, we found that intestinal polyA sites that undergo APA tend to have U-rich and/or A-rich upstream auxiliary elements that may contribute to the regulation of 3'-end formation in the intestine.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2012-07-22
    Description: Small RNAs (sRNAs) are a class of short (20–25 nt) non-coding RNAs that play important regulatory roles in gene expression. An essential first step in understanding their function is to confidently identify sRNA targets. In plants, several classes of sRNAs such as microRNAs (miRNAs) and trans-acting small interfering RNAs have been shown to bind with near-perfect complementarity to their messenger RNA (mRNA) targets, generally leading to cleavage of the mRNA. Recently, a high-throughput technique known as Parallel Analysis of RNA Ends (PARE) has made it possible to sequence mRNA cleavage products on a large-scale. Computational methods now exist to use these data to find targets of conserved and newly identified miRNAs. Due to speed limitations such methods rely on the user knowing which sRNA sequences are likely to target a transcript. By limiting the search to a tiny subset of sRNAs it is likely that many other sRNA/mRNA interactions will be missed. Here, we describe a new software tool called PAREsnip that allows users to search for potential targets of all sRNAs obtained from high-throughput sequencing experiments. By searching for targets of a complete ‘sRNAome’ we can facilitate large-scale identification of sRNA targets, allowing us to discover regulatory interaction networks.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2012-09-13
    Description: Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2012-09-13
    Description: Prophages are phages in lysogeny that are integrated into, and replicated as part of, the host bacterial genome. These mobile elements can have tremendous impact on their bacterial hosts’ genomes and phenotypes, which may lead to strain emergence and diversification, increased virulence or antibiotic resistance. However, finding prophages in microbial genomes remains a problem with no definitive solution. The majority of existing tools rely on detecting genomic regions enriched in protein-coding genes with known phage homologs, which hinders the de novo discovery of phage regions. In this study, a weighted phage detection algorithm, PhiSpy was developed based on seven distinctive characteristics of prophages, i.e. protein length, transcription strand directionality, customized AT and GC skew, the abundance of unique phage words, phage insertion points and the similarity of phage proteins. The first five characteristics are capable of identifying prophages without any sequence similarity with known phage genes. PhiSpy locates prophages by ranking genomic regions enriched in distinctive phage traits, which leads to the successful prediction of 94% of prophages in 50 complete bacterial genomes with a 6% false-negative rate and a 0.66% false-positive rate.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2012-09-13
    Description: The use of a priori knowledge in the alignment of targeted sequencing data is investigated using computational experiments. Adapting a Needleman–Wunsch algorithm to incorporate the genomic position information from the targeted capture, we demonstrate that alignment can be done to just the target region of interest. When in addition use is made of direct string comparison, an improvement of up to a factor of 8 in alignment speed compared to the fastest conventional aligner (Bowtie) is obtained. This results in a total alignment time in targeted sequencing of around 7 min for aligning approximately 56 million captured reads. For conventional aligners such as Bowtie, BWA or MAQ, alignment to just the target region is not feasible as experiments show that this leads to an additional 88% SNP calls, the vast majority of which are false positives (~92%).
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2012-06-28
    Description: We introduce Grinder ( http://sourceforge.net/projects/biogrinder/ ), an open-source bioinformatic tool to simulate amplicon and shotgun (genomic, metagenomic, transcriptomic and metatranscriptomic) datasets from reference sequences. This is the first tool to simulate amplicon datasets (e.g. 16S rRNA) widely used by microbial ecologists. Grinder can create sequence libraries with a specific community structure, α and β diversities and experimental biases (e.g. chimeras, gene copy number variation) for commonly used sequencing platforms. This versatility allows the creation of simple to complex read datasets necessary for hypothesis testing when developing bioinformatic software, benchmarking existing tools or designing sequence-based experiments. Grinder is particularly useful for simulating clinical or environmental microbial communities and complements the use of in vitro mock communities.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2012-06-06
    Description: The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith–Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2012-06-06
    Description: A chemistry-based artificial restriction DNA cutter (ARCUT) was recently prepared from Ce(IV)/EDTA complex and a pair of pseudo-complementary peptide nucleic acids. This cutter has freely tunable scission-site and site specificity. In this article, homologous recombination (HR) in human cells was promoted by cutting a substrate DNA with ARCUT, and the efficiency of this bioprocess was optimized by various chemical and biological approaches. Of two kinds of terminal structure formed by ARCUT, 3'-overhang termini provided by 1.7-fold higher efficiency than 5'-overhang termini. A longer homology length (e.g. 698 bp) was about 2-fold more favorable than shorter one (e.g. 100 bp). When the cell cycle was synchronized to G2/M phase with nocodazole, the HR was promoted by about 2-fold. Repression of the NHEJ-relevant proteins Ku70 and Ku80 by siRNA increased the efficiency by 2- to 3-fold. It was indicated that appropriate combination of all these chemical and biological approaches should be very effective to promote ARCUT-mediated HR in human cells.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2012-06-06
    Description: Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5'-untranslated regions.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2012-04-24
    Description: We describe a novel cloning method termed SLiCE (Seamless L i gation Cloning Extract) that utilizes easy to generate bacterial cell extracts to assemble multiple DNA fragments into recombinant DNA molecules in a single in vitro recombination reaction. SLiCE overcomes the sequence limitations of traditional cloning methods, facilitates seamless cloning by recombining short end homologies (≥15 bp) with or without flanking heterologous sequences and provides an effective strategy for directional subcloning of DNA fragments from Bacteria Artificial Chromosomes (BACs) or other sources. SLiCE is highly cost effective as a number of standard laboratory bacterial strains can serve as sources for SLiCE extract. In addition, the cloning efficiencies and capabilities of these strains can be greatly improved by simple genetic modifications. As an example, we modified the DH10B Escherichia coli strain to express an optimized prophage Red recombination system. This strain, termed PPY, facilitates SLiCE with very high efficiencies and demonstrates the versatility of the method.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2012-04-24
    Description: A novel biosensing approach for the label-free detection of nucleic acid sequences of short and large lengths has been implemented, with special emphasis on targeting RNA sequences with secondary structures. The approach is based on selecting 8-aminoadenine-modified parallel-stranded DNA tail-clamps as affinity bioreceptors. These receptors have the ability of creating a stable triplex-stranded helix at neutral pH upon hybridization with the nucleic acid target. A surface plasmon resonance biosensor has been used for the detection. With this strategy, we have detected short DNA sequences (32-mer) and purified RNA (103-mer) at the femtomol level in a few minutes in an easy and level-free way. This approach is particularly suitable for the detection of RNA molecules with predicted secondary structures, reaching a limit of detection of 50 fmol without any label or amplification steps. Our methodology has shown a marked enhancement for the detection (18% for short DNA and 54% for RNA), when compared with the conventional duplex approach, highlighting the large difficulty of the duplex approach to detect nucleic acid sequences, especially those exhibiting stable secondary structures. We believe that our strategy could be of great interest to the RNA field.
    Keywords: RNA characterisation and manipulation
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2012-04-24
    Description: Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (multivariate analysis of transcript splicing), a Bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon splicing patterns, and a Markov chain Monte Carlo (MCMC) method coupled with a simulation-based adaptive sampling procedure to calculate the P -value and false discovery rate (FDR) of differential alternative splicing. Importantly, the MATS approach is applicable to almost any type of null hypotheses of interest, providing the flexibility to identify differential alternative splicing events that match a given user-defined pattern. We evaluated the performance of MATS using simulated and real RNA-Seq data sets. In the RNA-Seq analysis of alternative splicing events regulated by the epithelial-specific splicing factor ESRP1, we obtained a high RT–PCR validation rate of 86% for differential exon skipping events with a MATS FDR of 〈10%. Additionally, over the full list of RT–PCR tested exons, the MATS FDR estimates matched well with the experimental validation rate. Our results demonstrate that MATS is an effective and flexible approach for detecting differential alternative splicing from RNA-Seq data.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2012-04-24
    Description: Targeted gene addition to mammalian genomes is central to biotechnology, basic research and gene therapy. For example, gene targeting to the ROSA26 locus by homologous recombination in embryonic stem cells is commonly used for mouse transgenesis to achieve ubiquitous and persistent transgene expression. However, conventional methods are not readily adaptable to gene targeting in other cell types. The emerging zinc finger nuclease (ZFN) technology facilitates gene targeting in diverse species and cell types, but an optimal strategy for engineering highly active ZFNs is still unclear. We used a modular assembly approach to build ZFNs that target the ROSA26 locus. ZFN activity was dependent on the number of modules in each zinc finger array. The ZFNs were active in a variety of cell types in a time- and dose-dependent manner. The ZFNs directed gene addition to the ROSA26 locus, which enhanced the level of sustained gene expression, the uniformity of gene expression within clonal cell populations and the reproducibility of gene expression between clones. These ZFNs are a promising resource for cell engineering, mouse transgenesis and pre-clinical gene therapy studies. Furthermore, this characterization of the modular assembly method provides general insights into the implementation of the ZFN technology.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2012-05-13
    Description: Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2012-05-13
    Description: A simple approach for creating libraries of circularly permuted proteins is described that is called PERMutation Using Transposase Engineering (PERMUTE). In PERMUTE, the transposase MuA is used to randomly insert a minitransposon that can function as a protein expression vector into a plasmid that contains the open reading frame (ORF) being permuted. A library of vectors that express different permuted variants of the ORF-encoded protein is created by: (i) using bacteria to select for target vectors that acquire an integrated minitransposon; (ii) excising the ensemble of ORFs that contain an integrated minitransposon from the selected vectors; and (iii) circularizing the ensemble of ORFs containing integrated minitransposons using intramolecular ligation. Construction of a Thermotoga neapolitana adenylate kinase (AK) library using PERMUTE revealed that this approach produces vectors that express circularly permuted proteins with distinct sequence diversity from existing methods. In addition, selection of this library for variants that complement the growth of Escherichia coli with a temperature-sensitive AK identified functional proteins with novel architectures, suggesting that PERMUTE will be useful for the directed evolution of proteins with new functions.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2012-05-13
    Description: Insertional mutagenesis screens in mice are used to identify individual genes that drive tumor formation. In these screens, candidate cancer genes are identified if their genomic location is proximal to a common insertion site (CIS) defined by high rates of transposon or retroviral insertions in a given genomic window. In this article, we describe a new method for defining CISs based on a Poisson distribution, the Poisson Regression Insertion Model, and show that this new method is an improvement over previously described methods. We also describe a modification of the method that can identify pairs and higher orders of co-occurring common insertion sites. We apply these methods to two data sets, one generated in a transposon-based screen for gastrointestinal tract cancer genes and another based on the set of retroviral insertions in the Retroviral Tagged Cancer Gene Database. We show that the new methods identify more relevant candidate genes and candidate gene pairs than found using previous methods. Identification of the biologically relevant set of mutations that occur in a single cell and cause tumor progression will aid in the rational design of single and combinatorial therapies in the upcoming age of personalized cancer therapy.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2012-05-13
    Description: The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator x 4 normalizations x 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primers.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2012-05-13
    Description: The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated R NA- S eq a nalysis p ipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2012-05-23
    Description: Deciphering the structure of gene regulatory networks across the tree of life remains one of the major challenges in postgenomic biology. We present a novel ChIP-seq workflow for the archaea using the model organism Halobacterium salinarum sp. NRC-1 and demonstrate its application for mapping the genome-wide binding sites of natively expressed transcription factors. This end-to-end pipeline is the first protocol for ChIP-seq in archaea, with methods and tools for each stage from gene tagging to data analysis and biological discovery. Genome-wide binding sites for transcription factors with many binding sites (TfbD) are identified with sensitivity, while retaining specificity in the identification the smaller regulons (bacteriorhodopsin-activator protein). Chromosomal tagging of target proteins with a compact epitope facilitates a standardized and cost-effective workflow that is compatible with high-throughput immunoprecipitation of natively expressed transcription factors. The Pique package, an open-source bioinformatics method, is presented for identification of binding events. Relative to ChIP-Chip and qPCR, this workflow offers a robust catalog of protein–DNA binding events with improved spatial resolution and significantly decreased cost. While this study focuses on the application of ChIP-seq in H. salinarum sp. NRC-1, our workflow can also be adapted for use in other archaea and bacteria with basic genetic tools.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2012-05-23
    Description: A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2012-02-28
    Description: ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs , a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1 28 000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2012-02-28
    Description: Synthetic scaffolds that permit spatial and temporal organization of enzymes in living cells are a promising post-translational strategy for controlling the flow of information in both metabolic and signaling pathways. Here, we describe the use of plasmid DNA as a stable, robust and configurable scaffold for arranging biosynthetic enzymes in the cytoplasm of Escherichia coli . This involved conversion of individual enzymes into custom DNA-binding proteins by genetic fusion to zinc-finger domains that specifically bind unique DNA sequences. When expressed in cells that carried a rationally designed DNA scaffold comprising corresponding zinc finger binding sites, the titers of diverse metabolic products, including resveratrol, 1,2-propanediol and mevalonate were increased as a function of the scaffold architecture. These results highlight the utility of DNA scaffolds for assembling biosynthetic enzymes into functional metabolic structures. Beyond metabolism, we anticipate that DNA scaffolds may be useful in sequestering different types of enzymes for specifying the output of biological signaling pathways or for coordinating other assembly-line processes such as protein folding, degradation and post-translational modifications.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2014-03-13
    Description: To reveal the full potential of human pluripotent stem cells, new methods for rapid, site-specific genomic engineering are needed. Here, we describe a system for precise genetic modification of human embryonic stem cells (ESCs) and induced pluripotent stem cells (iPSCs). We identified a novel human locus, H11 , located in a safe, intergenic, transcriptionally active region of chromosome 22, as the recipient site, to provide robust, ubiquitous expression of inserted genes. Recipient cell lines were established by site-specific placement of a ‘landing pad’ cassette carrying attP sites for phiC31 and Bxb1 integrases at the H11 locus by spontaneous or TALEN-assisted homologous recombination. Dual integrase cassette exchange (DICE) mediated by phiC31 and Bxb1 integrases was used to insert genes of interest flanked by phiC31 and Bxb1 attB sites at the H11 locus, replacing the landing pad. This system provided complete control over content, direction and copy number of inserted genes, with a specificity of 100%. A series of genes, including mCherry and various combinations of the neural transcription factors LMX1a, FOXA2 and OTX2, were inserted in recipient cell lines derived from H9 ESC, as well as iPSC lines derived from a Parkinson’s disease patient and a normal sibling control. The DICE system offers rapid, efficient and precise gene insertion in ESC and iPSC and is particularly well suited for repeated modifications of the same locus.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2014-03-13
    Description: Genetic disorders can be detected by prenatal diagnosis using Chorionic Villus Sampling, but the 1:100 chance to result in miscarriage restricts the use to fetuses that are suspected to have an aberration. Detection of trisomy 21 cases noninvasively is now possible owing to the upswing of next-generation sequencing (NGS) because a small percentage of fetal DNA is present in maternal plasma. However, detecting other trisomies and smaller aberrations can only be realized using high-coverage NGS, making it too expensive for routine practice. We present a method, WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR), which detects small aberrations using low-coverage NGS. The increased detection resolution was achieved by comparing read counts within the tested sample of each genomic region with regions on other chromosomes that behave similarly in control samples. This within-sample comparison avoids the need to re-sequence control samples. WISECONDOR correctly identified all T13, T18 and T21 cases while coverages were as low as 0.15–1.66. No false positives were identified. Moreover, WISECONDOR also identified smaller aberrations, down to 20 Mb, such as del(13)(q12.3q14.3), +i(12)(p10) and i(18)(q10). This shows that prevalent fetal copy number aberrations can be detected accurately and affordably by shallow sequencing maternal plasma. WISECONDOR is available at bioinformatics.tudelft.nl/wisecondor.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2014-03-13
    Description: Recombineering, which is the use of homologous recombination for DNA engineering in Escherichia coli , usually uses antibiotic selection to identify the intended recombinant. When combined in a second step with counterselection using a small molecule toxin, seamless products can be obtained. Here, we report the advantages of a genetic strategy using CcdB as the counterselectable agent. Expression of CcdB is toxic to E. coli in the absence of the CcdA antidote so counterselection is initiated by the removal of CcdA expression. CcdB counterselection is robust and does not require titrations or experiment-to-experiment optimization. Because counterselection strategies necessarily differ according to the copy number of the target, we describe two variations. For multi-copy targets, we use two E. coli hosts so that counterselection is exerted by the transformation step that is needed to separate the recombined and unrecombined plasmids. For single copy targets, we put the ccdA gene onto the temperature-sensitive pSC101 Red expression plasmid so that counterselection is exerted by the standard temperature shift to remove the expression plasmid. To reduce unwanted intramolecular recombination, we also combined CcdB counterselection with Redα omission. These options improve the use of counterselection in recombineering with BACs, plasmids and the E. coli chromosome.
    Keywords: Synthetic Biology and Assembly Cloning
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...