ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Books
  • Articles  (51)
  • Massively Parallel (Deep) Sequencing  (30)
  • Polymorphism/mutation detection  (21)
  • Oxford University Press  (51)
  • MDPI Publishing
  • Nucleic Acids Research  (51)
  • 60967
  • 88336
  • 1
    Publication Date: 2016-06-21
    Description: Identifying coding genes is an essential step in genome annotation. Here, we utilize existing whole genome alignments to detect conserved coding exons and then map gene annotations from one genome to many aligned genomes. We show that genome alignments contain thousands of spurious frameshifts and splice site mutations in exons that are truly conserved. To overcome these limitations, we have developed CESAR (Coding Exon-Structure Aware Realigner) that realigns coding exons, while considering reading frame and splice sites of each exon. CESAR effectively avoids spurious frameshifts in conserved genes and detects 91% of shifted splice sites. This results in the identification of thousands of additional conserved exons and 99% of the exons that lack inactivating mutations match real exons. Finally, to demonstrate the potential of using CESAR for comparative gene annotation, we applied it to 188 788 exons of 19 865 human genes to annotate human genes in 99 other vertebrates. These comparative gene annotations are available as a resource ( http://bds.mpi-cbg.de/hillerlab/CESAR/ ). CESAR ( https://github.com/hillerlab/CESAR/ ) can readily be applied to other alignments to accurately annotate coding genes in many other vertebrate and invertebrate genomes.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2016-06-21
    Description: Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2016-05-06
    Description: So far, there has been no report on molecularly resolved discrimination of single nucleobase mismatches using surface-confined single stranded locked nucleic acid (ssLNA) probes. Herein, it is exemplified using a label-independent force-sensing approach that an optimal coverage of 12-mer ssLNA sensor probes formed onto gold(111) surface allows recognition of ssDNA targets with twice stronger force sensitivity than 12-mer ssDNA sensor probes. The force distributions are reproducible and the molecule-by-molecule force measurements are largely in agreement with ensemble on-surface melting temperature data. Importantly, the molecularly resolved detection is responsive to the presence of single nucleobase mismatches in target sequences. Since the labelling steps can be eliminated from protocol, and each force-based detection event occurs within milliseconds' time scale, the force-sensing assay is potentially capable of rapid detection. The LNA probe performance is indicative of versatility in terms of substrate choice - be it gold (for basic research and array-based applications) or silicon (for ‘lab-on-a-chip’ type devices). The nucleic acid microarray technologies could therefore be generally benefited by adopting the LNA films, in place of DNA. Since LNA is nuclease-resistant, unlike DNA, and the LNA-based assay is sensitive to single nucleobase mismatches, the possibilities for label-free in vitro rapid diagnostics based on the LNA probes may be explored.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2016-07-09
    Description: Epistasis plays an essential role in the development of complex diseases. Interaction methods face common challenge of seeking a balance between persistent power, model complexity, computation efficiency, and validity of identified bio-markers. We introduce a novel W-test to identify pairwise epistasis effect, which measures the distributional difference between cases and controls through a combined log odds ratio. The test is model-free, fast, and inherits a Chi-squared distribution with data adaptive degrees of freedom. No permutation is needed to obtain the P -values. Simulation studies demonstrated that the W-test is more powerful in low frequency variants environment than alternative methods, which are the Chi-squared test, logistic regression and multifactor-dimensionality reduction (MDR). In two independent real bipolar disorder genome-wide associations (GWAS) datasets, the W-test identified significant interactions pairs that can be replicated, including SLIT3-CENPN, SLIT3-TMEM132D, CNTNAP2-NDST4 and CNTCAP2-RTN4R . The genes in the pairs play central roles in neurotransmission and synapse formation. A majority of the identified loci are undiscoverable by main effect and are low frequency variants. The proposed method offers a powerful alternative tool for mapping the genetic puzzle underlying complex disorders.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2016-03-01
    Description: Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational Taxonomic Units (OTUs) and therefore leading to the overestimation of microbial diversity. Sequencing errors will also result in OTUs that are not accurate reconstructions of the original biological sequences. Herein we present the Poisson binomial filtering algorithm (PBF), which minimizes both problems by calculating the error-probability distribution of a sequence from its quality scores. In order to validate our method, we quality-filtered 37 publicly available datasets obtained by sequencing mock and environmental microbial communities with the Roche 454, Illumina MiSeq and IonTorrent PGM platforms, and compared our results to those obtained with previous approaches such as the ones included in mothur, QIIME and USEARCH. Our algorithm retained substantially more reads than its predecessors, while resulting in fewer and more accurate OTUs. This improved sensitiveness produced more faithful representations, both quantitatively and qualitatively, of the true microbial diversity present in the studied samples. Furthermore, the method introduced in this work is computationally inexpensive and can be readily applied in conjunction with any existent analysis pipeline.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2016-02-20
    Description: Somatic mosaicism refers to the existence of somatic mutations in a fraction of somatic cells in a single biological sample. Its importance has mainly been discussed in theory although experimental work has started to emerge linking somatic mosaicism to disease diagnosis. Through novel statistical modeling of paired-end DNA-sequencing data using blood-derived DNA from healthy donors as well as DNA from tumor samples, we present an ultra-fast computational pipeline, LocHap that searches for multiple single nucleotide variants (SNVs) that are scaffolded by the same reads. We refer to scaffolded SNVs as local haplotypes (LH). When an LH exhibits more than two genotypes, we call it a local haplotype variant (LHV). The presence of LHVs is considered evidence of somatic mosaicism because a genetically homogeneous cell population will not harbor LHVs. Applying LocHap to whole-genome and whole-exome sequence data in DNA from normal blood and tumor samples, we find wide-spread LHVs across the genome. Importantly, we find more LHVs in tumor samples than in normal samples, and more in older adults than in younger ones. We confirm the existence of LHVs and somatic mosaicism by validation studies in normal blood samples. LocHap is publicly available at http://www.compgenome.org/lochap .
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2016-02-20
    Description: Next-generation sequencing (NGS) technologies have transformed genomic research and have the potential to revolutionize clinical medicine. However, the background error rates of sequencing instruments and limitations in targeted read coverage have precluded the detection of rare DNA sequence variants by NGS. Here we describe a method, termed CypherSeq, which combines double-stranded barcoding error correction and rolling circle amplification (RCA)-based target enrichment to vastly improve NGS-based rare variant detection. The CypherSeq methodology involves the ligation of sample DNA into circular vectors, which contain double-stranded barcodes for computational error correction and adapters for library preparation and sequencing. CypherSeq is capable of detecting rare mutations genome-wide as well as those within specific target genes via RCA-based enrichment. We demonstrate that CypherSeq is capable of correcting errors incurred during library preparation and sequencing to reproducibly detect mutations down to a frequency of 2.4 x 10 –7 per base pair, and report the frequency and spectra of spontaneous and ethyl methanesulfonate-induced mutations across the Saccharomyces cerevisiae genome.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2016-03-19
    Description: With the wide availability of whole-genome sequencing (WGS), genetic mapping has become the rate-limiting step, inhibiting unbiased forward genetics in even the most tractable model organisms. We introduce a rapid deconvolution resource and method for untagged causative mutations after mutagenesis, screens, and WGS in Escherichia coli . We created Deconvoluter—ordered libraries with selectable insertions every 50 kb in the E. coli genome. The Deconvoluter method uses these for replacement of untagged mutations in the genome using a phage-P1-based gene-replacement strategy. We validate the Deconvoluter resource by deconvolution of 17 of 17 phenotype-altering mutations from a screen of N -ethyl- N -nitrosourea-induced mutants. The Deconvoluter resource permits rapid unbiased screens and gene/function identification and will enable exploration of functions of essential genes and undiscovered genes/sites/alleles not represented in existing deletion collections. This resource for unbiased forward-genetic screens with mapping-by-sequencing (‘forward genomics’) demonstrates a strategy that could similarly enable rapid screens in many other microbes.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2016-12-01
    Description: Copy Number Variants (CNVs) are structural rearrangements contributing to phenotypic variation that have been proved to be associated with many disease states. Over the last years, the identification of CNVs from whole-exome sequencing (WES) data has become a common practice for research and clinical purpose and, consequently, the demand for more and more efficient and accurate methods has increased. In this paper, we demonstrate that more than 30% of WES data map outside the targeted regions and that these reads, usually discarded, can be exploited to enhance the identification of CNVs from WES experiments. Here, we present EXCAVATOR2, the first read count based tool that exploits all the reads produced by WES experiments to detect CNVs with a genome-wide resolution. To evaluate the performance of our novel tool we use it for analysing two WES data sets, a population data set sequenced by the 1000 Genomes Project and a tumor data set made of bladder cancer samples. The results obtained from these analyses demonstrate that EXCAVATOR2 outperforms other four state-of-the-art methods and that our combined approach enlarge the spectrum of detectable CNVs from WES data with an unprecedented resolution. EXCAVATOR2 is freely available at http://sourceforge.net/projects/excavator2tool/ .
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2016-11-01
    Description: Genome assemblies that are accurate, complete and contiguous are essential for identifying important structural and functional elements of genomes and for identifying genetic variation. Nevertheless, most recent genome assemblies remain incomplete and fragmented. While long molecule sequencing promises to deliver more complete genome assemblies with fewer gaps, concerns about error rates, low yields, stringent DNA requirements and uncertainty about best practices may discourage many investigators from adopting this technology. Here, in conjunction with the platinum standard Drosophila melanogaster reference genome, we analyze recently published long molecule sequencing data to identify what governs completeness and contiguity of genome assemblies. We also present a hybrid meta-assembly approach that achieves remarkable assembly contiguity for both Drosophila and human assemblies with only modest long molecule sequencing coverage. Our results motivate a set of preliminary best practices for obtaining accurate and contiguous assemblies, a ‘missing manual’ that guides key decisions in building high quality de novo genome assemblies, from DNA isolation to polishing the assembly.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2016-11-01
    Description: SNPs (Single Nucleotide Polymorphisms) are genetic markers whose precise identification is a prerequisite for association studies. Methods to identify them are currently well developed for model species, but rely on the availability of a (good) reference genome, and therefore cannot be applied to non-model species. They are also mostly tailored for whole genome (re-)sequencing experiments, whereas in many cases, transcriptome sequencing can be used as a cheaper alternative which already enables to identify SNPs located in transcribed regions. In this paper, we propose a method that identifies, quantifies and annotates SNPs without any reference genome, using RNA-seq data only. Individuals can be pooled prior to sequencing, if not enough material is available from one individual. Using pooled human RNA-seq data, we clarify the precision and recall of our method and discuss them with respect to other methods which use a reference genome or an assembled transcriptome. We then validate experimentally the predictions of our method using RNA-seq data from two non-model species. The method can be used for any species to annotate SNPs and predict their impact on the protein sequence. We further enable to test for the association of the identified SNPs with a phenotype of interest.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2016-08-20
    Description: Various types of mutation and editing (M/E) events in microRNAs (miRNAs) can change the stabilities of pre-miRNAs and/or complementarities between miRNAs and their targets. Small RNA (sRNA) high-throughput sequencing (HTS) profiles can contain many mutated and edited miRNAs. Systematic detection of miRNA mutation and editing sites from the huge volume of sRNA HTS profiles is computationally difficult, as high sensitivity and low false positive rate (FPR) are both required. We propose a novel method (named MiRME) for an accurate and fast detection of miRNA M/E sites using a progressive sequence alignment approach which refines sensitivity and improves FPR step-by-step. From 70 sRNA HTS profiles with over 1.3 billion reads, MiRME has detected thousands of statistically significant M/E sites, including 3'-editing sites, 57 A-to-I editing sites (of which 32 are novel), as well as some putative non-canonical editing sites. We demonstrated that a few non-canonical editing sites were not resulted from mutations in genome by integrating the analysis of genome HTS profiles of two human cell lines, suggesting the existence of new editing types to further diversify the functions of miRNAs. Compared with six existing studies or methods, MiRME has shown much superior performance for the identification and visualization of the M/E sites of miRNAs from the ever-increasing sRNA HTS profiles.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2016-10-14
    Description: The accumulation of somatic mitochondrial DNA (mtDNA) mutations contributes to the pathogenesis of human disease. Currently, mitochondrial mutations are largely considered results of inaccurate processing of its heavily damaged genome. However, mainly from a lack of methods to monitor mtDNA mutations with sufficient sensitivity and accuracy, a link between mtDNA damage and mutation has not been established. To test the hypothesis that mtDNA-damaging agents induce mtDNA mutations, we exposed Muta TM Mouse mice to benzo[ a ]pyrene (B[a]P) or N -ethyl- N -nitrosourea (ENU), daily for 28 consecutive days, and quantified mtDNA point and deletion mutations in bone marrow and liver using our newly developed Digital Random Mutation Capture (dRMC) and Digital Deletion Detection (3D) assays. Surprisingly, our results demonstrate mutagen treatment did not increase mitochondrial point or deletion mutation frequencies, despite evidence both compounds increase nuclear DNA mutations and demonstrated B[a]P adduct formation in mtDNA. These findings contradict models of mtDNA mutagenesis that assert the elevated rate of mtDNA mutation stems from damage sensitivity and abridged repair capacity. Rather, our results demonstrate induced mtDNA damage does not readily convert into mutation. These findings suggest robust mitochondrial damage responses repress induced mutations after mutagen exposure.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2015-12-16
    Description: The enrichment of targeted regions within complex next generation sequencing libraries commonly uses biotinylated baits to capture the desired sequences. This method results in high read coverage over the targets and their flanking regions. Oxford Nanopore Technologies recently released an USB3.0-interfaced sequencer, the MinION. To date no particular method for enriching MinION libraries has been standardized. Here, using biotinylated PCR-generated baits in a novel approach, we describe a simple and efficient way for multiplexed enrichment of MinION libraries, overcoming technical limitations related with the chemistry of the sequencing-adapters and the length of the DNA fragments. Using Phage Lambda and Escherichia coli as models we selectively enrich for specific targets, significantly increasing the corresponding read-coverage, eliminating unwanted regions. We show that by capturing genomic fragments, which contain the target sequences, we recover reads extending targeted regions and thus can be used for the determination of potentially unknown flanking sequences. By pooling enriched libraries derived from two distinct E. coli strains and analyzing them in parallel, we demonstrate the efficiency of this method in multiplexed format. Crucially we evaluated the optimal bait size for large fragment libraries and we describe for the first time a standardized method for target enrichment in MinION platform.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2015-12-16
    Description: In clinical diagnostics a great need exists for targeted in situ multiplex nucleic acid analysis as the mutational status can offer guidance for effective treatment. One well-established method uses padlock probes for mutation detection and multiplex expression analysis directly in cells and tissues. Here, we use oligonucleotide gap-fill ligation to further increase specificity and to capture molecular substrates for in situ sequencing. Short oligonucleotides are joined at both ends of a padlock gap probe by two ligation events and are then locally amplified by target-primed rolling circle amplification (RCA) preserving spatial information. We demonstrate the specific detection of the A3243G mutation of mitochondrial DNA and we successfully characterize a single nucleotide variant in the ACTB mRNA in cells by in situ sequencing of RCA products generated by padlock gap-fill ligation. To demonstrate the clinical applicability of our assay, we show specific detection of a point mutation in the EGFR gene in fresh frozen and formalin-fixed, paraffin-embedded (FFPE) lung cancer samples and confirm the detected mutation by in situ sequencing. This approach presents several advantages over conventional padlock probes allowing simpler assay design for multiplexed mutation detection to screen for the presence of mutations in clinically relevant mutational hotspots directly in situ .
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2015-05-20
    Description: Somatic variant analysis of a tumour sample and its matched normal has been widely used in cancer research to distinguish germline polymorphisms from somatic mutations. However, due to the extensive intratumour heterogeneity of cancer, sequencing data from a single tumour sample may greatly underestimate the overall mutational landscape. In recent studies, multiple spatially or temporally separated tumour samples from the same patient were sequenced to identify the regional distribution of somatic mutations and study intratumour heterogeneity. There are a number of tools to perform somatic variant calling from matched tumour-normal next-generation sequencing (NGS) data; however none of these allow joint analysis of multiple same-patient samples. We discuss the benefits and challenges of multisample somatic variant calling and present multiSNV, a software package for calling single nucleotide variants (SNVs) using NGS data from multiple same-patient samples. Instead of performing multiple pairwise analyses of a single tumour sample and a matched normal, multiSNV jointly considers all available samples under a Bayesian framework to increase sensitivity of calling shared SNVs. By leveraging information from all available samples, multiSNV is able to detect rare mutations with variant allele frequencies down to 3% from whole-exome sequencing experiments.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2015-05-20
    Description: Single-cell mRNA sequencing (RNA-seq) methods have undergone rapid development in recent years, and transcriptome analysis of relevant cell populations at single-cell resolution has become a key research area of biomedical sciences. We here present s ingle- c ell mRNA 3 -prime end seq uencing (SC3-seq), a practical methodology based on PCR amplification followed by 3-prime-end enrichment for highly quantitative, parallel and cost-effective measurement of gene expression in single cells. The SC3-seq allows excellent quantitative measurement of mRNAs ranging from the 10,000-cell to 1-cell level, and accordingly, allows an accurate estimate of the transcript levels by a regression of the read counts of spike-in RNAs with defined copy numbers. The SC3-seq has clear advantages over other typical single-cell RNA-seq methodologies for the quantitative measurement of transcript levels and at a sequence depth required for the saturation of transcript detection. The SC3-seq distinguishes four distinct cell types in the peri-implantation mouse blastocysts. Furthermore, the SC3-seq reveals the heterogeneity in human-induced pluripotent stem cells (hiPSCs) cultured under on-feeder as well as feeder-free conditions, demonstrating a more homogeneous property of the feeder-free hiPSCs. We propose that SC3-seq might be used as a powerful strategy for single-cell transcriptome analysis in a broad range of investigations in biomedical sciences.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2015-06-24
    Description: Whole exome sequencing (WES) is increasingly used in research and diagnostics. WES users expect coverage of the entire coding region of known genes as well as sufficient read depth for the covered regions. It is, however, unknown which recent WES platform is most suitable to meet these expectations. We present insights into the performance of the most recent standard exome enrichment platforms from Agilent, NimbleGen and Illumina applied to six different DNA samples by two sequencing vendors per platform. Our results suggest that both Agilent and NimbleGen overall perform better than Illumina and that the high enrichment performance of Agilent is stable among samples and between vendors, whereas NimbleGen is only able to achieve vendor- and sample-specific best exome coverage. Moreover, the recent Agilent platform overall captures more coding exons with sufficient read depth than NimbleGen and Illumina. Due to considerable gaps in effective exome coverage, however, the three platforms cannot capture all known coding exons alone or in combination, requiring improvement. Our data emphasize the importance of evaluation of updated platform versions and suggest that enrichment-free whole genome sequencing can overcome the limitations of WES in sufficiently covering coding exons, especially GC-rich regions, and in characterizing structural variants.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2015-12-02
    Description: Highly abundant microRNAs (miRNAs) in small RNA sequencing libraries make it difficult to obtain efficient measurements of more lowly expressed species. We present a new method that allows for the selective blocking of specific, abundant miRNAs during preparation of sequencing libraries. This technique is specific with little off-target effects and has no impact on the reproducibility of the measurement of non-targeted species. In human plasma samples, we demonstrate that blocking of highly abundant hsa-miR-16–5p leads to improved detection of lowly expressed miRNAs and more precise measurement of differential expression overall. Furthermore, we establish the ability to target a second abundant miRNA and to multiplex the blocking of two miRNAs simultaneously. For small RNA sequencing, this technique could fill a similar role as do ribosomal or globin removal technologies in messenger RNA sequencing.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2015-04-02
    Description: Next-generation sequencing has been widely used for the genome-wide profiling of histone modifications, transcription factor binding and gene expression through chromatin immunoprecipitated DNA sequencing (ChIP-seq) and cDNA sequencing (RNA-seq). Here, we describe a versatile library construction method that can be applied to both ChIP-seq and RNA-seq on the widely used Illumina platforms. Standard methods for ChIP-seq library construction require nanograms of starting DNA, substantially limiting its application to rare cell types or limited clinical samples. By minimizing the DNA purification steps that cause major sample loss, our method achieved a high sensitivity in ChIP-seq library preparation. Using this method, we achieved the following: (i) generated high-quality epigenomic and transcription factor-binding maps using ChIP-seq for murine adipocytes; (ii) successfully prepared a ChIP-seq library from as little as 25 pg of starting DNA; (iii) achieved paired-end sequencing of the ChIP-seq libraries; (iv) systematically profiled gene expression dynamics during murine adipogenesis using RNA-seq and (v) preserved the strand specificity of the transcripts in RNA-seq. Given its sensitivity and versatility in both double-stranded and single-stranded DNA library construction, this method has wide applications in genomic, epigenomic, transcriptomic and interactomic studies.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2015-01-10
    Description: Deep sequencing of strand-specific cDNA libraries is now a ubiquitous tool for identifying and quantifying RNAs in diverse sample types. The accuracy of conclusions drawn from these analyses depends on precise and quantitative conversion of the RNA sample into a DNA library suitable for sequencing. Here, we describe an optimized method of preparing strand-specific RNA deep sequencing libraries from small RNAs and variably sized RNA fragments obtained from ribonucleoprotein particle footprinting experiments or fragmentation of long RNAs. Our approach works across a wide range of input amounts (400 pg to 200 ng), is easy to follow and produces a library in 2–3 days at relatively low reagent cost, all while giving the user complete control over every step. Because all enzymatic reactions were optimized and driven to apparent completion, sequence diversity and species abundance in the input sample are well preserved.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2015-11-17
    Description: Single Molecule, Real-Time (SMRT ® ) Sequencing (Pacific Biosciences, Menlo Park, CA, USA) provides the longest continuous DNA sequencing reads currently available. However, the relatively high error rate in the raw read data requires novel analysis methods to deconvolute sequences derived from complex samples. Here, we present a workflow of novel computer algorithms able to reconstruct viral variant genomes present in mixtures with an accuracy of 〉QV50. This approach relies exclusively on Continuous Long Reads (CLR), which are the raw reads generated during SMRT Sequencing. We successfully implement this workflow for simultaneous sequencing of mixtures containing up to forty different 〉9 kb HIV-1 full genomes. This was achieved using a single SMRT Cell for each mixture and desktop computing power. This novel approach opens the possibility of solving complex sequencing tasks that currently lack a solution.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2015-11-17
    Description: Various biases affect high-throughput sequencing read counts. Contrary to the general assumption, we show that bias does not always cancel out when fold changes are computed and that bias affects more than 20% of genes that are called differentially regulated in RNA-seq experiments with drastic effects on subsequent biological interpretation. Here, we propose a novel approach to estimate fold changes. Our method is based on a probabilistic model that directly incorporates count ratios instead of read counts. It provides a theoretical foundation for pseudo-counts and can be used to estimate fold change credible intervals as well as normalization factors that outperform currently used normalization methods. We show that fold change estimates are significantly improved by our method by comparing RNA-seq derived fold changes to qPCR data from the MAQC/SEQC project as a reference and analyzing random barcoded sequencing data. Our software implementation is freely available from the project website http://www.bio.ifi.lmu.de/software/lfc .
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2015-11-17
    Description: The human reference assembly remains incomplete due to the underrepresentation of repeat-rich sequences that are found within centromeric regions and acrocentric short arms. Although these sequences are marginally represented in the assembly, they are often fully represented in whole-genome short-read datasets and contribute to inappropriate alignments and high read-depth signals that localize to a small number of assembled homologous regions. As a consequence, these regions often provide artifactual peak calls that confound hypothesis testing and large-scale genomic studies. To address this problem, we have constructed mapping targets that represent roughly 8% of the human genome generally omitted from the human reference assembly. By integrating these data into standard mapping and peak-calling pipelines we demonstrate a 10-fold reduction in signals in regions common to the blacklisted region and identify a comprehensive set of regions that exhibit mapping sensitivity with the presence of the repeat-rich targets.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2015-08-18
    Description: There is an increasing interest in complementing RNA-seq experiments with small-RNA (sRNA) expression data to obtain a comprehensive view of a transcriptome. Currently, two main experimental challenges concerning sRNA-seq exist: how to check the size distribution of isolated sRNAs, given the sensitive size-selection steps in the protocol; and how to normalize data between samples, given the low complexity of sRNA types. We here present two separate sets of synthetic RNA spike-ins for monitoring size-selection and for performing data normalization in sRNA-seq. The size-range quality control (SRQC) spike-in set, consisting of 11 oligoribonucleotides (10–70 nucleotides), was tested by intentionally altering the size-selection protocol and verified via several comparative experiments. We demonstrate that the SRQC set is useful to reproducibly track down biases in the size-selection in sRNA-seq. The external reference for data-normalization (ERDN) spike-in set, consisting of 19 oligoribonucleotides, was developed for sample-to-sample normalization in differential-expression analysis of sRNA-seq data. Testing and applying the ERDN set showed that it can reproducibly detect differential expression over a dynamic range of 2 18 . Hence, biological variation in sRNA composition and content between samples is preserved while technical variation is effectively minimized. Together, both spike-in sets can significantly improve the technical reproducibility of sRNA-seq.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2014-10-10
    Description: The sequencing of libraries containing molecules shorter than the read length, such as in ancient or forensic applications, may result in the production of reads that include the adaptor, and in paired reads that overlap one another. Challenges for the processing of such reads are the accurate identification of the adaptor sequence and accurate reconstruction of the original sequence most likely to have given rise to the observed read(s). We introduce an algorithm that removes the adaptors and reconstructs the original DNA sequences using a Bayesian maximum a posteriori probability approach. Our algorithm is faster, and provides a more accurate reconstruction of the original sequence for both simulated and ancient DNA data sets, than other approaches. leeHom is released under the GPLv3 and is freely available from: https://bioinf.eva.mpg.de/leehom/
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2014-04-15
    Description: Pyrosequencing of the 16S ribosomal RNA gene (16S) has become one of the most popular methods to assess microbial diversity. Pyrosequencing reads containing ambiguous bases (Ns) are generally discarded based on the assumptions of their non-sequence-dependent formation and high error rates. However, taxonomic composition differed by removal of reads with Ns. We determined whether Ns from pyrosequencing occur in a sequence-dependent manner. Our reads and the corresponding flow value data revealed occurrence of sequence-specific N errors with a common sequential pattern (a homopolymer + a few nucleotides with bases other than the homopolymer + N) and revealed that the nucleotide base of the homopolymer is the true base for the following N. Using an algorithm reflecting this sequence-dependent pattern, we corrected the Ns in the 16S (86.54%), bphD (81.37%) and nifH (81.55%) amplicon reads from a mock community with high precisions of 95.4, 96.9 and 100%, respectively. The new N correction method was applicable for determining most of Ns in amplicon reads from a soil sample, resulting in reducing taxonomic biases associated with N errors and in shotgun sequencing reads from public metagenome data. The method improves the accuracy and precision of microbial community analysis and genome sequencing using 454 pyrosequencing.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2014-11-12
    Description: Targeted resequencing technologies have allowed for efficient and cost-effective detection of genomic variants in specific regions of interest. Although capture sequencing has been primarily used for investigating single nucleotide variants and indels, it has the potential to elucidate a broader spectrum of genetic variation, including copy number variants (CNVs). Various methods exist for detecting CNV in whole-genome and exome sequencing datasets. However, no algorithms have been specifically designed for contiguous target sequencing, despite its increasing importance in clinical and research applications. We have developed cnvCapSeq, a novel method for accurate and sensitive CNV discovery and genotyping in long-range targeted resequencing. cnvCapSeq was benchmarked using a simulated contiguous capture sequencing dataset comprising 21 genomic loci of various lengths. cnvCapSeq was shown to outperform the best existing exome CNV method by a wide margin both in terms of sensitivity (92.0 versus 48.3%) and specificity (99.8 versus 70.5%). We also applied cnvCapSeq to a real capture sequencing cohort comprising a contiguous 358 kb region that contains the Complement Factor H gene cluster. In this dataset, cnvCapSeq identified 41 samples with CNV, including two with duplications, with a genotyping accuracy of 99%, as ascertained by quantitative real-time PCR.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2014-09-17
    Description: Heterogeneity is a ubiquitous feature of biological systems. A complete understanding of such systems requires a method for uniquely identifying and tracking individual components and their interactions with each other. We have developed a novel method of uniquely tagging individual cells in vivo with a genetic ‘barcode’ that can be recovered by DNA sequencing. Our method is a two-component system comprised of a genetic barcode cassette whose fragments are shuffled by Rci , a site-specific DNA invertase. The system is highly scalable, with the potential to generate theoretical diversities in the billions. We demonstrate the feasibility of this technique in Escherichia coli . Currently, this method could be employed to track the dynamics of populations of microbes through various bottlenecks. Advances of this method should prove useful in tracking interactions of cells within a network, and/or heterogeneity within complex biological samples.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2014-09-17
    Description: Barcoded vectors are promising tools for investigating clonal diversity and dynamics in hematopoietic gene therapy. Analysis of clones marked with barcoded vectors requires accurate identification of potentially large numbers of individually rare barcodes, when the exact number, sequence identity and abundance are unknown. This is an inherently challenging application, and the feasibility of using contemporary next-generation sequencing technologies is unresolved. To explore this potential application empirically, without prior assumptions, we sequenced barcode libraries of known complexity. Libraries containing 1, 10 and 100 Sanger-sequenced barcodes were sequenced using an Illumina platform, with a 100-barcode library also sequenced using a SOLiD platform. Libraries containing 1 and 10 barcodes were distinguished from false barcodes generated by sequencing error by a several log-fold difference in abundance. In 100-barcode libraries, however, expected and false barcodes overlapped and could not be resolved by bioinformatic filtering and clustering strategies. In independent sequencing runs multiple false-positive barcodes appeared to be represented at higher abundance than known barcodes, despite their confirmed absence from the original library. Such errors, which potentially impact barcoding studies in an application-dependent manner, are consistent with the existence of both stochastic and systematic error, the mechanism of which is yet to be fully resolved.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2014-08-01
    Description: Identifying somatic mutations is critical for cancer genome characterization and for prioritizing patient treatment. DNA whole exome sequencing (DNA-WES) is currently the most popular technology; however, this yields low sensitivity in low purity tumors. RNA sequencing (RNA-seq) covers the expressed exome with depth proportional to expression. We hypothesized that integrating DNA-WES and RNA-seq would enable superior mutation detection versus DNA-WES alone. We developed a first-of-its-kind method, called UNCeqR , that detects somatic mutations by integrating patient-matched RNA-seq and DNA-WES. In simulation, the integrated DNA and RNA model outperformed the DNA-WES only model. Validation by patient-matched whole genome sequencing demonstrated superior performance of the integrated model over DNA-WES only models, including a published method and published mutation profiles. Genome-wide mutational analysis of breast and lung cancer cohorts ( n = 871) revealed remarkable tumor genomics properties. Low purity tumors experienced the largest gains in mutation detection by integrating RNA-seq and DNA-WES. RNA provided greater mutation signal than DNA in expressed mutations. Compared to earlier studies on this cohort, UNCeqR increased mutation rates of driver and therapeutically targeted genes (e.g. PIK3CA , ERBB2 and FGFR2 ). In summary, integrating RNA-seq with DNA-WES increases mutation detection performance, especially for low purity tumors.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2013-04-02
    Description: As researchers begin probing deep coverage sequencing data for increasingly rare mutations and subclonal events, the fidelity of next generation sequencing (NGS) laboratory methods will become increasingly critical. Although error rates for sequencing and polymerase chain reaction (PCR) are well documented, the effects that DNA extraction and other library preparation steps could have on downstream sequence integrity have not been thoroughly evaluated. Here, we describe the discovery of novel C 〉 A/G 〉 T transversion artifacts found at low allelic fractions in targeted capture data. Characteristics such as sequencer read orientation and presence in both tumor and normal samples strongly indicated a non-biological mechanism. We identified the source as oxidation of DNA during acoustic shearing in samples containing reactive contaminants from the extraction process. We show generation of 8-oxoguanine (8-oxoG) lesions during DNA shearing, present analysis tools to detect oxidation in sequencing data and suggest methods to reduce DNA oxidation through the introduction of antioxidants. Further, informatics methods are presented to confidently filter these artifacts from sequencing data sets. Though only seen in a low percentage of reads in affected samples, such artifacts could have profoundly deleterious effects on the ability to confidently call rare mutations, and eliminating other possible sources of artifacts should become a priority for the research community.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2013-07-16
    Description: We present an in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data—mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling ( Thymallus thymallus ) and Rainbow trout ( Oncorhynchus mykiss ). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in 〈24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/chrishah/MITObim .
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2013-07-16
    Description: Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F 1 measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. ( http://code.google.com/p/pyrohmmsnp/ ).
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2013-05-29
    Description: Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2013-05-29
    Description: Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is 〈160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2013-01-20
    Description: The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at http://www.netlab.uky.edu/p/bioinfo/DiffSplice .
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2013-09-06
    Description: Cancers are heterogeneous and genetically unstable. New methods are needed that provide the sensitivity and specificity to query single cells at the genetic loci that drive cancer progression, thereby enabling researchers to study the progression of individual tumors. Here, we report the development and application of a bead-based hemi-nested microfluidic droplet digital PCR (dPCR) technology to achieve ‘quantitative’ measurement and single-molecule sequencing of somatically acquired carcinogenic translocations at extremely low levels (〈10 –6 ) in healthy subjects. We use this technique in our healthy study population to determine the overall concentration of the t(14;18) translocation, which is strongly associated with follicular lymphoma. The nested dPCR approach improves the detection limit to 1 x 10 –7 or lower while maintaining the analysis efficiency and specificity. Further, the bead-based dPCR enabled us to isolate and quantify the relative amounts of the various clonal forms of t(14;18) translocation in these subjects, and the single-molecule sensitivity and resolution of dPCR led to the discovery of new clonal forms of t(14;18) that were otherwise masked by the conventional quantitative PCR measurements. In this manner, we created a quantitative map for this carcinogenic mutation in this healthy population and identified the positions on chromosomes 14 and 18 where the vast majority of these t(14;18) events occur.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2013-02-02
    Description: Template switching (TS) has been an inherent mechanism of reverse transcriptase, which has been exploited in several transcriptome analysis methods, such as CAGE, RNA-Seq and short RNA sequencing. TS is an attractive option, given the simplicity of the protocol, which does not require an adaptor mediated step and thus minimizes sample loss. As such, it has been used in several studies that deal with limited amounts of RNA, such as in single cell studies. Additionally, TS has also been used to introduce DNA barcodes or indexes into different samples, cells or molecules. This labeling allows one to pool several samples into one sequencing flow cell, increasing the data throughput of sequencing and takes advantage of the increasing throughput of current sequences. Here, we report TS artifacts that form owing to a process called strand invasion. Due to the way in which barcodes/indexes are introduced by TS, strand invasion becomes more problematic by introducing unsystematic biases. We describe a strategy that eliminates these artifacts in silico and propose an experimental solution that suppresses biases from TS.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2013-02-02
    Description: Detection of low-level DNA variations in the presence of wild-type DNA is important in several fields of medicine, including cancer, prenatal diagnosis and infectious diseases. PCR-based methods to enrich mutations during amplification have limited multiplexing capability, are mostly restricted to known mutations and are prone to polymerase or mis-priming errors. Here, we present Di fferential S trand Se paration at C ritical T emperature (DISSECT), a method that enriches unknown mutations of targeted DNA sequences purely based on thermal denaturation of DNA heteroduplexes without the need for enzymatic reactions. Target DNA is pre-amplified in a multiplex reaction and hybridized onto complementary probes immobilized on magnetic beads that correspond to wild-type DNA sequences. Presence of any mutation on the target DNA forms heteroduplexes that are subsequently denatured from the beads at a critical temperature and selectively separated from wild-type DNA. We demonstrate multiplexed enrichment by 100- to 400-fold for KRAS and TP53 mutations at multiple positions of the targeted sequence using two to four successive cycles of DISSECT. Cancer and plasma-circulating DNA samples containing traces of mutations undergo mutation enrichment allowing detection via Sanger sequencing or high-resolution melting. The simplicity, scalability and reliability of DISSECT make it a powerful method for mutation enrichment that integrates well with existing downstream detection methods.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2013-08-09
    Description: Human leukocyte antigen (HLA) typing at the allelic level can in theory be achieved using whole exome sequencing (exome-seq) data with no added cost but has been hindered by its computational challenge. We developed ATHLATES, a program that applies assembly, allele identification and allelic pair inference to short read sequences, and applied it to data from Illumina platforms. In 15 data sets with adequate coverage for HLA-A, -B, -C, -DRB1 and -DQB1 genes, ATHLATES correctly reported 74 out of 75 allelic pairs with an overall concordance rate of 99% compared with conventional typing. This novel approach should be broadly applicable to research and clinical laboratories.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2013-08-09
    Description: In developing B cells, the immunoglobulin heavy chain ( IgH ) locus is thought to move from repressive to permissive chromatin compartments to facilitate its scheduled rearrangement. In mature B cells, maintenance of allelic exclusion has been proposed to involve recruitment of the non-productive IgH allele to pericentromeric heterochromatin. Here, we used an allele-specific chromosome conformation capture combined with sequencing (4C-seq) approach to unambigously follow the individual IgH alleles in mature B lymphocytes. Despite their physical and functional difference, productive and non-productive IgH alleles in B cells and unrearranged IgH alleles in T cells share many chromosomal contacts and largely reside in active chromatin. In brain, however, the locus resides in a different repressive environment. We conclude that IgH adopts a lymphoid-specific nuclear location that is, however, unrelated to maintenance of allelic exclusion. We additionally find that in mature B cells—but not in T cells—the distal V H regions of both IgH alleles position themselves away from active chromatin. This, we speculate, may help to restrict enhancer activity to the productively rearranged V H promoter element.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2013-04-14
    Description: Polymorphisms in the target mRNA sequence can greatly affect the binding affinity of microarray probe sequences, leading to false-positive and false-negative expression quantitative trait locus (QTL) signals with any other polymorphisms in linkage disequilibrium. We provide the most complete solution to this problem, by using the latest genome and exome sequence reference data to identify almost all common polymorphisms (frequency 〉1% in Europeans) in probe sequences for two commonly used microarray panels (the gene-based Illumina Human HT12 array, which uses 50-mer probes, and exon-based Affymetrix Human Exon 1.0 ST array, which uses 25-mer probes). We demonstrate the impact of this problem using cerebellum and frontal cortex tissues from 438 neuropathologically normal individuals. We find that although only a small proportion of the probes contain polymorphisms, they account for a large proportion of apparent expression QTL signals, and therefore result in many false signals being declared as real. We find that the polymorphism-in-probe problem is insufficiently controlled by previous protocols, and illustrate this using some notable false-positive and false-negative examples in MAPT and PRICKLE1 that can be found in many eQTL databases. We recommend that both new and existing eQTL data sets should be carefully checked in order to adequately address this issue.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2013-04-14
    Description: We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2–4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai .
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2012-09-13
    Description: Recent advances in RNA sequencing technology (RNA-Seq) enables comprehensive profiling of RNAs by producing millions of short sequence reads from size-fractionated RNA libraries. Although conventional tools for detecting and distinguishing non-coding RNAs (ncRNAs) from reference-genome data can be applied to sequence data, ncRNA detection can be improved by harnessing the full information content provided by this new technology. Here we present N orah D esk , the first unbiased and universally applicable method for small ncRNAs detection from RNA-Seq data. N orah D esk utilizes the coverage-distribution of small RNA sequence data as well as thermodynamic assessments of secondary structure to reliably predict and annotate ncRNA classes. Using publicly available mouse sequence data from brain, skeletal muscle, testis and ovary, we evaluated our method with an emphasis on the performance for microRNAs (miRNAs) and piwi-interacting small RNA (piRNA). We compared our method with D ario and mir D eep 2 and found that N orah D esk produces longer transcripts with higher read coverage. This feature makes it the first method particularly suitable for the prediction of both known and novel piRNAs.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2012-06-06
    Description: Non-coding RNAs (ncRNA) account for a large portion of the transcribed genomic output. This diverse family of untranslated RNA molecules play a crucial role in cellular function. The use of ‘deep sequencing’ technology (also known as ‘next generation sequencing’) to infer transcript expression levels in general, and ncRNA specifically, is becoming increasingly common in molecular and clinical laboratories. We developed a software termed ‘RandA’ (which stands for ncRNA Read-and-Analyze) that performs comprehensive ncRNA profiling and differential expression analysis on deep sequencing generated data through a graphical user interface running on a local personal computer. Using RandA, we reveal the complexity of the ncRNA repertoire in a given cell population. We further demonstrate the relevance of such an extensive ncRNA analysis by elucidating a multitude of characterizing features in pathogen infected mammalian cells. RandA is available for download at http://ibis.tau.ac.il/RandA .
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2012-03-14
    Description: The ‘Random Mutation Capture’ assay allows for the sensitive quantitation of DNA mutations at extremely low mutation frequencies. This method is based on PCR detection of mutations that render the mutated target sequence resistant to restriction enzyme digestion. The original protocol prescribes an end-point dilution to about 0.1 mutant DNA molecules per PCR well, such that the mutation burden can be simply calculated by counting the number of amplified PCR wells. However, the statistical aspects associated with the single molecular nature of this protocol and several other molecular approaches relying on binary (on/off) output can significantly affect the quantification accuracy, and this issue has so far been ignored. The present work proposes a design of experiment (DoE) using statistical modeling and Monte Carlo simulations to obtain a statistically optimal sampling protocol, one that minimizes the coefficient of variance in the measurement estimates. Here, the DoE prescribed a dilution factor at about 1.6 mutant molecules per well. Theoretical results and experimental validation revealed an up to 10-fold improvement in the information obtained per PCR well, i.e. the optimal protocol achieves the same coefficient of variation using one-tenth the number of wells used in the original assay. Additionally, this optimization equally applies to any method that relies on binary detection of a small number of templates.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2012-03-14
    Description: DNA mutations are the inevitable consequences of errors that arise during replication and repair of DNA damage. Because of their random and infrequent occurrence, quantification and characterization of DNA mutations in the genome of somatic cells has been difficult. Random, low-abundance mutations are currently inaccessible by standard high-throughput sequencing approaches because they cannot be distinguished from sequencing errors. One way to circumvent this problem and simultaneously account for the mutational heterogeneity within tissues is whole genome sequencing of a representative number of single cells. Here, we show elevated mutation levels in single cells from Drosophila melanogaster S2 and mouse embryonic fibroblast populations after treatment with the powerful mutagen N -ethyl- N -nitrosourea. This method can be applied as a direct measure of exposure to mutagenic agents and for assessing genotypic heterogeneity within tissues or cell populations.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2012-02-17
    Description: Standard Illumina mate-paired libraries are constructed from 3- to 5-kb DNA fragments by a blunt-end circularization. Sequencing reads that pass through the junction of the two joined ends of a 3–5-kb DNA fragment are not easy to identify and pose problems during mapping and de novo assembly. Longer read lengths increase the possibility that a read will cross the junction. To solve this problem, we developed a mate-paired protocol for use with Illumina sequencing technology that uses Cre-Lox recombination instead of blunt end circularization. In this method, a LoxP sequence is incorporated at the junction site. This sequence allows screening reads for junctions without using a reference genome. Junction reads can be trimmed or split at the junction. Moreover, the location of the LoxP sequence in the reads distinguishes mate-paired reads from spurious paired-end reads. We tested this new method by preparing and sequencing a mate-paired library with an insert size of 3 kb from Saccharomyces cerevisiae . We present an analysis of the library quality statistics and a new bio-informatics tool called DeLoxer that can be used to analyze an IlluminaCre-Lox mate-paired data set. We also demonstrate how the resulting data significantly improves a de novo assembly of the S. cerevisiae genome.
    Keywords: Massively Parallel (Deep) Sequencing
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2012-08-08
    Description: The utilization of archived, formalin-fixed paraffin-embedded (FFPE) tumor samples for massive parallel sequencing has been challenging due to DNA damage and contamination with normal stroma. Here, we perform whole genome sequencing of DNA isolated from two triple-negative breast cancer tumors archived for 〉11 years as 5 µm FFPE sections and matched germline DNA. The tumor samples show differing amounts of FFPE damaged DNA sequencing reads revealed as relatively high alignment mismatch rates enriched for C·G 〉 T·A substitutions compared to germline samples. This increase in mismatch rate is observable with as few as one million reads, allowing for an upfront evaluation of the sample integrity before whole genome sequencing. By applying innovative quality filters incorporating global nucleotide mismatch rates and local mismatch rates, we present a method to identify high-confidence somatic mutations even in the presence of FFPE induced DNA damage. This results in a breast cancer mutational profile consistent with previous studies and revealing potentially important functional mutations. Our study demonstrates the feasibility of performing genome-wide deep sequencing analysis of FFPE archived tumors of limited sample size such as residual cancer after treatment or metastatic biopsies.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2012-11-25
    Description: Life abounds with genetic variations writ in sequences that are often only a few hundred nucleotides long. Rapid detection of these variations for identification of genetic diseases, pathogens and organisms has become the mainstay of molecular science and medicine. This report describes a new, highly informative closed-tube polymerase chain reaction (PCR) strategy for analysis of both known and unknown sequence variations. It combines efficient quantitative amplification of single-stranded DNA targets through LATE-PCR with sets of Lights-On/Lights-Off probes that hybridize to their target sequences over a broad temperature range. Contiguous pairs of Lights-On/Lights-Off probes of the same fluorescent color are used to scan hundreds of nucleotides for the presence of mutations. Sets of probes in different colors can be combined in the same tube to analyze even longer single-stranded targets. Each set of hybridized Lights-On/Lights-Off probes generates a composite fluorescent contour, which is mathematically converted to a sequence-specific fluorescent signature. The versatility and broad utility of this new technology is illustrated in this report by characterization of variant sequences in three different DNA targets: the rpoB gene of Mycobacterium tuberculosis, a sequence in the mitochondrial cytochrome C oxidase subunit 1 gene of nematodes and the V3 hypervariable region of the bacterial 16 s ribosomal RNA gene. We anticipate widespread use of these technologies for diagnostics, species identification and basic research.
    Keywords: Polymorphism/mutation detection
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...