ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation (2013)

Costello, M., Pugh, T. J., Fennell, T. J., Stewart, C., Lichtenstein, L., Meldrim, J. C., Fostel, J. L., Friedrich, D. C., Perrin, D., Dionne, D., Kim, S., Gabriel, S. B., Lander, E. S., Fisher, S., Getz, G.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-04-02

Description: As researchers begin probing deep coverage sequencing data for increasingly rare mutations and subclonal events, the fidelity of next generation sequencing (NGS) laboratory methods will become increasingly critical. Although error rates for sequencing and polymerase chain reaction (PCR) are well documented, the effects that DNA extraction and other library preparation steps could have on downstream sequence integrity have not been thoroughly evaluated. Here, we describe the discovery of novel C 〉 A/G 〉 T transversion artifacts found at low allelic fractions in targeted capture data. Characteristics such as sequencer read orientation and presence in both tumor and normal samples strongly indicated a non-biological mechanism. We identified the source as oxidation of DNA during acoustic shearing in samples containing reactive contaminants from the extraction process. We show generation of 8-oxoguanine (8-oxoG) lesions during DNA shearing, present analysis tools to detect oxidation in sequencing data and suggest methods to reduce DNA oxidation through the introduction of antioxidants. Further, informatics methods are presented to confidently filter these artifacts from sequencing data sets. Though only seen in a low percentage of reads in affected samples, such artifacts could have profoundly deleterious effects on the ability to confidently call rare mutations, and eliminating other possible sources of artifacts should become a priority for the research community.

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Reconstructing mitochondrial genomes directly from genomic next-generation sequencing reads--a baiting and iterative mapping approach (2013)

Hahn, C., Bachmann, L., Chevreux, B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-07-16

Description: We present an in silico approach for the reconstruction of complete mitochondrial genomes of non-model organisms directly from next-generation sequencing (NGS) data—mitochondrial baiting and iterative mapping (MITObim). The method is straightforward even if only (i) distantly related mitochondrial genomes or (ii) mitochondrial barcode sequences are available as starting-reference sequences or seeds, respectively. We demonstrate the efficiency of the approach in case studies using real NGS data sets of the two monogenean ectoparasites species Gyrodactylus thymalli and Gyrodactylus derjavinoides including their respective teleost hosts European grayling ( Thymallus thymallus ) and Rainbow trout ( Oncorhynchus mykiss ). MITObim appeared superior to existing tools in terms of accuracy, runtime and memory requirements and fully automatically recovered mitochondrial genomes exceeding 99.5% accuracy from total genomic DNA derived NGS data sets in 〈24 h using a standard desktop computer. The approach overcomes the limitations of traditional strategies for obtaining mitochondrial genomes for species with little or no mitochondrial sequence information at hand and represents a fast and highly efficient in silico alternative to laborious conventional strategies relying on initial long-range PCR. We furthermore demonstrate the applicability of MITObim for metagenomic/pooled data sets using simulated data. MITObim is an easy to use tool even for biologists with modest bioinformatics experience. The software is made available as open source pipeline under the MIT license at https://github.com/chrishah/MITObim .

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

PyroHMMsnp: an SNP caller for Ion Torrent and 454 sequencing data (2013)

Zeng, F., Jiang, R., Chen, T.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-07-16

Description: Both 454 and Ion Torrent sequencers are capable of producing large amounts of long high-quality sequencing reads. However, as both methods sequence homopolymers in one cycle, they both suffer from homopolymer uncertainty and incorporation asynchronization. In mapping, such sequencing errors could shift alignments around homopolymers and thus induce incorrect mismatches, which have become a critical barrier against the accurate detection of single nucleotide polymorphisms (SNPs). In this article, we propose a hidden Markov model (HMM) to statistically and explicitly formulate homopolymer sequencing errors by the overcall, undercall, insertion and deletion. We use a hierarchical model to describe the sequencing and base-calling processes, and we estimate parameters of the HMM from resequencing data by an expectation-maximization algorithm. Based on the HMM, we develop a realignment-based SNP-calling program, termed PyroHMMsnp, which realigns read sequences around homopolymers according to the error model and then infers the underlying genotype by using a Bayesian approach. Simulation experiments show that the performance of PyroHMMsnp is exceptional across various sequencing coverages in terms of sensitivity, specificity and F 1 measure, compared with other tools. Analysis of the human resequencing data shows that PyroHMMsnp predicts 12.9% more SNPs than Samtools while achieving a higher specificity. ( http://code.google.com/p/pyrohmmsnp/ ).

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Crass: identification and reconstruction of CRISPR from unassembled metagenomic data (2013)

Skennerton, C. T., Imelfort, M., Tyson, G. W.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-05-29

Description: Clustered regularly interspaced short palindromic repeats (CRISPR) constitute a bacterial and archaeal adaptive immune system that protect against bacteriophage (phage). Analysis of CRISPR loci reveals the history of phage infections and provides a direct link between phage and their hosts. All current tools for CRISPR identification have been developed to analyse completed genomes and are not well suited to the analysis of metagenomic data sets, where CRISPR loci are difficult to assemble owing to their repetitive structure and population heterogeneity. Here, we introduce a new algorithm, Crass, which is designed to identify and reconstruct CRISPR loci from raw metagenomic data without the need for assembly or prior knowledge of CRISPR in the data set. CRISPR in assembled data are often fragmented across many contigs/scaffolds and do not fully represent the population heterogeneity of CRISPR loci. Crass identified substantially more CRISPR in metagenomes previously analysed using assembly-based approaches. Using Crass, we were able to detect CRISPR that contained spacers with sequence homology to phage in the system, which would not have been identified using other approaches. The increased sensitivity, specificity and speed of Crass will facilitate comprehensive analysis of CRISPRs in metagenomic data sets, increasing our understanding of phage-host interactions and co-evolution within microbial communities.

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote (2013)

Liao, Y., Smyth, G. K., Shi, W.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-05-29

Description: Read alignment is an ongoing challenge for the analysis of data from sequencing technologies. This article proposes an elegantly simple multi-seed strategy, called seed-and-vote, for mapping reads to a reference genome. The new strategy chooses the mapped genomic location for the read directly from the seeds. It uses a relatively large number of short seeds (called subreads) extracted from each read and allows all the seeds to vote on the optimal location. When the read length is 〈160 bp, overlapping subreads are used. More conventional alignment algorithms are then used to fill in detailed mismatch and indel information between the subreads that make up the winning voting block. The strategy is fast because the overall genomic location has already been chosen before the detailed alignment is done. It is sensitive because no individual subread is required to map exactly, nor are individual subreads constrained to map close by other subreads. It is accurate because the final location must be supported by several different subreads. The strategy extends easily to find exon junctions, by locating reads that contain sets of subreads mapping to different exons of the same gene. It scales up efficiently for longer reads.

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

DiffSplice: the genome-wide detection of differential splicing events with RNA-seq (2013)

Hu, Y., Huang, Y., Du, Y., Orellana, C. F., Singh, D., Johnson, A. R., Monroy, A., Kuan, P.-F., Hammond, S. M., Makowski, L., Randell, S. H., Chiang, D. Y., Hayes, D. N., Jones, C., Liu, Y., Prins, J. F., Liu, J.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-01-20

Description: The RNA transcriptome varies in response to cellular differentiation as well as environmental factors, and can be characterized by the diversity and abundance of transcript isoforms. Differential transcription analysis, the detection of differences between the transcriptomes of different cells, may improve understanding of cell differentiation and development and enable the identification of biomarkers that classify disease types. The availability of high-throughput short-read RNA sequencing technologies provides in-depth sampling of the transcriptome, making it possible to accurately detect the differences between transcriptomes. In this article, we present a new method for the detection and visualization of differential transcription. Our approach does not depend on transcript or gene annotations. It also circumvents the need for full transcript inference and quantification, which is a challenging problem because of short read lengths, as well as various sampling biases. Instead, our method takes a divide-and-conquer approach to localize the difference between transcriptomes in the form of alternative splicing modules (ASMs), where transcript isoforms diverge. Our approach starts with the identification of ASMs from the splice graph, constructed directly from the exons and introns predicted from RNA-seq read alignments. The abundance of alternative splicing isoforms residing in each ASM is estimated for each sample and is compared across sample groups. A non-parametric statistical test is applied to each ASM to detect significant differential transcription with a controlled false discovery rate. The sensitivity and specificity of the method have been assessed using simulated data sets and compared with other state-of-the-art approaches. Experimental validation using qRT-PCR confirmed a selected set of genes that are differentially expressed in a lung differentiation study and a breast cancer data set, demonstrating the utility of the approach applied on experimental biological data sets. The software of DiffSplice is available at http://www.netlab.uky.edu/p/bioinfo/DiffSplice .

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Suppression of artifacts and barcode bias in high-throughput transcriptome analyses utilizing template switching (2013)

Tang, D. T. P., Plessy, C., Salimullah, M., Suzuki, A. M., Calligaris, R., Gustincich, S., Carninci, P.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: Template switching (TS) has been an inherent mechanism of reverse transcriptase, which has been exploited in several transcriptome analysis methods, such as CAGE, RNA-Seq and short RNA sequencing. TS is an attractive option, given the simplicity of the protocol, which does not require an adaptor mediated step and thus minimizes sample loss. As such, it has been used in several studies that deal with limited amounts of RNA, such as in single cell studies. Additionally, TS has also been used to introduce DNA barcodes or indexes into different samples, cells or molecules. This labeling allows one to pool several samples into one sequencing flow cell, increasing the data throughput of sequencing and takes advantage of the increasing throughput of current sequences. Here, we report TS artifacts that form owing to a process called strand invasion. Due to the way in which barcodes/indexes are introduced by TS, strand invasion becomes more problematic by introducing unsystematic biases. We describe a strategy that eliminates these artifacts in silico and propose an experimental solution that suppresses biases from TS.

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

ATHLATES: accurate typing of human leukocyte antigen through exome sequencing (2013)

Liu, C., Yang, X., Duffy, B., Mohanakumar, T., Mitra, R. D., Zody, M. C., Pfeifer, J. D.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-08-09

Description: Human leukocyte antigen (HLA) typing at the allelic level can in theory be achieved using whole exome sequencing (exome-seq) data with no added cost but has been hindered by its computational challenge. We developed ATHLATES, a program that applies assembly, allele identification and allelic pair inference to short read sequences, and applied it to data from Illumina platforms. In 15 data sets with adequate coverage for HLA-A, -B, -C, -DRB1 and -DQB1 genes, ATHLATES correctly reported 74 out of 75 allelic pairs with an overall concordance rate of 99% compared with conventional typing. This novel approach should be broadly applicable to research and clinical laboratories.

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Allelic exclusion of the immunoglobulin heavy chain locus is independent of its nuclear localization in mature B cells (2013)

Holwerda, S. J. B., van de Werken, H. J. G., Ribeiro de Almeida, C., Bergen, I. M., de Bruijn, M. J. W., Verstegen, M. J. A. M., Simonis, M., Splinter, E., Wijchers, P. J., Hendriks, R. W., de Laat, W.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-08-09

Description: In developing B cells, the immunoglobulin heavy chain ( IgH ) locus is thought to move from repressive to permissive chromatin compartments to facilitate its scheduled rearrangement. In mature B cells, maintenance of allelic exclusion has been proposed to involve recruitment of the non-productive IgH allele to pericentromeric heterochromatin. Here, we used an allele-specific chromosome conformation capture combined with sequencing (4C-seq) approach to unambigously follow the individual IgH alleles in mature B lymphocytes. Despite their physical and functional difference, productive and non-productive IgH alleles in B cells and unrearranged IgH alleles in T cells share many chromosomal contacts and largely reside in active chromatin. In brain, however, the locus resides in a different repressive environment. We conclude that IgH adopts a lymphoid-specific nuclear location that is, however, unrelated to maintenance of allelic exclusion. We additionally find that in mature B cells—but not in T cells—the distal V H regions of both IgH alleles position themselves away from active chromatin. This, we speculate, may help to restrict enhancer activity to the productively rearranged V H promoter element.

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

Resolving the polymorphism-in-probe problem is critical for correct interpretation of expression QTL studies (2013)

Ramasamy, A., Trabzuni, D., Gibbs, J. R., Dillman, A., Hernandez, D. G., Arepalli, S., Walker, R., Smith, C., Ilori, G. P., Shabalin, A. A., Li, Y., Singleton, A. B., Cookson, M. R., for NABEC, Hardy, J., for UKBEC, Ryten, M., Weale, M. E.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-04-14

Description: Polymorphisms in the target mRNA sequence can greatly affect the binding affinity of microarray probe sequences, leading to false-positive and false-negative expression quantitative trait locus (QTL) signals with any other polymorphisms in linkage disequilibrium. We provide the most complete solution to this problem, by using the latest genome and exome sequence reference data to identify almost all common polymorphisms (frequency 〉1% in Europeans) in probe sequences for two commonly used microarray panels (the gene-based Illumina Human HT12 array, which uses 50-mer probes, and exon-based Affymetrix Human Exon 1.0 ST array, which uses 25-mer probes). We demonstrate the impact of this problem using cerebellum and frontal cortex tissues from 438 neuropathologically normal individuals. We find that although only a small proportion of the probes contain polymorphisms, they account for a large proportion of apparent expression QTL signals, and therefore result in many false signals being declared as real. We find that the polymorphism-in-probe problem is insufficiently controlled by previous protocols, and illustrate this using some notable false-positive and false-negative examples in MAPT and PRICKLE1 that can be found in many eQTL databases. We recommend that both new and existing eQTL data sets should be carefully checked in order to adequately address this issue.

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

11

Unknown

Fast and accurate read mapping with approximate seeds and multiple backtracking (2013)

Siragusa, E., Weese, D., Reinert, K.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-04-14

Description: We present Masai, a read mapper representing the state-of-the-art in terms of speed and accuracy. Our tool is an order of magnitude faster than RazerS 3 and mrFAST, 2–4 times faster and more accurate than Bowtie 2 and BWA. The novelties of our read mapper are filtration with approximate seeds and a method for multiple backtracking. Approximate seeds, compared with exact seeds, increase filtration specificity while preserving sensitivity. Multiple backtracking amortizes the cost of searching a large set of seeds by taking advantage of the repetitiveness of next-generation sequencing data. Combined together, these two methods significantly speed up approximate search on genomic data sets. Masai is implemented in C++ using the SeqAn library. The source code is distributed under the BSD license and binaries for Linux, Mac OS X and Windows can be freely downloaded from http://www.seqan.de/projects/masai .

Keywords: Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext