ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

feed icon rss

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
  • 1
  • 2
    Publication Date: 2012-07-22
    Description: Cytosines in genomic DNA are sometimes methylated. This affects many biological processes and diseases. The standard way of measuring methylation is to use bisulfite, which converts unmethylated cytosines to thymines, then sequence the DNA and compare it to a reference genome sequence. We describe a method for the critical step of aligning the DNA reads to the correct genomic locations. Our method builds on classic alignment techniques, including likelihood-ratio scores and spaced seeds. In a realistic benchmark, our method has a better combination of sensitivity, specificity and speed than nine other high-throughput bisulfite aligners. This study enables more accurate and rational analysis of DNA methylation. It also illustrates how to adapt general-purpose alignment methods to a special case with distorted base patterns: this should be informative for other special cases such as ancient DNA and AT-rich genomes.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2014-05-01
    Description: Proximal promoters are fundamental genomic elements for gene expression. They vary in terms of GC percentage, CpG abundance, presence of TATA signal, evolutionary conservation, chromosomal spread of transcription start sites and breadth of expression across cell types. These properties are correlated, and it has been suggested that there are two classes of promoters: one class with high CpG, widely spread transcription start sites and broad expression, and another with TATA signals, narrow spread and restricted expression. However, it has been unclear why these properties are correlated in this way. We reexamined these features using the deep FANTOM5 CAGE data from hundreds of cell types. First, we point out subtle but important biases in previous definitions of promoters and of expression breadth. Second, we show that most promoters are rather nonspecifically expressed across many cell types. Third, promoters’ expression breadth is independent of maximum expression level, and therefore correlates with average expression level. Fourth, the data show a more complex picture than two classes, with a network of direct and indirect correlations among promoter properties. By tentatively distinguishing the direct from the indirect correlations, we reveal simple explanations for them.
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2014-12-04
    Description: Motivation: The alignment of DNA sequences to proteins, allowing for frameshifts, is a classic method in sequence analysis. It can help identify pseudogenes (which accumulate mutations), analyze raw DNA and RNA sequence data (which may have frameshift sequencing errors), investigate ribosomal frameshifts, etc. Often, however, only ad hoc approximations or simulations are available to provide the statistical significance of a frameshift alignment score. Results: We describe a method to estimate statistical significance of frameshift alignments, similar to classic BLAST statistics. (BLAST presently does not permit its alignments to include frameshifts.) We also illustrate the continuing usefulness of frameshift alignment with two ‘post-genomic’ applications: (i) when finding pseudogenes within the human genome, frameshift alignments show that most anciently conserved non-coding human elements are recent pseudogenes with conserved ancestral genes; and (ii) when analyzing metagenomic DNA reads from polluted soil, frameshift alignments show that most alignable metagenomic reads contain frameshifts, suggesting that metagenomic analysis needs to use frameshift alignment to derive accurate results. Availability and implementation: The statistical calculation is available in FALP ( http://www.ncbi.nlm.nih.gov/CBBresearch/Spouge/html_ncbi/html/index/software.html ), and giga-scale frameshift alignment is available in LAST ( http://last.cbrc.jp/falp ). Contact: spouge@ncbi.nlm.nih.gov or martin@cbrc.jp Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2012-10-10
    Description: It is well known that remnants of partial or whole copies of mitochondrial DNA, known as Nuclear MiTochondrial sequences (NUMTs), are found in nuclear genomes. Since whole genome sequences have become available, many bioinformatics studies have identified putative NUMTs and from those attempted to infer the factors involved in NUMT creation. These studies conclude that NUMTs represent randomly chosen regions of the mitochondrial genome. There is less consensus regarding the nuclear insertion sites of NUMTs — previous studies have discussed the possible role of retrotransposons, but some recent ones have reported no correlation or even anti-correlation between NUMT sites and retrotransposons. These studies have generally defined NUMT sites using BLAST with default parameters. We analyze a redefined set of human NUMTs, computed with a carefully considered protocol. We discover that the inferred insertion points of NUMTs have a strong tendency to have high-predicted DNA curvature, occur in experimentally defined open chromatin regions and often occur immediately adjacent to A + T oligomers. We also show clear evidence that their flanking regions are indeed rich in retrotransposons. Finally we show that parts of the mitochondrial genome D-loop are under-represented as a source of NUMTs in primate evolution.
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2014-03-18
    Description: The suffix array and its variants are text-indexing data structures that have become indispensable in the field of bioinformatics. With the uninitiated in mind, we provide an accessible exposition of the SA-IS algorithm, which is the state of the art in suffix array construction. We also describe DisLex, a technique that allows standard suffix array construction algorithms to create modified suffix arrays designed to enable a simple form of inexact matching needed to support ‘spaced seeds’ and ‘subset seeds’ used in many biological applications.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2014-04-15
    Description: Sequence similarity search is a fundamental way of analyzing nucleotide sequences. Despite decades of research, this is not a solved problem because there exist many similarities that are not found by current methods. Search methods are typically based on a seed-and-extend approach, which has many variants (e.g. spaced seeds, transition seeds), and it remains unclear how to optimize this approach. This study designs and tests seeding methods for inter-mammal and inter-insect genome comparison. By considering substitution patterns of real genomes, we design sets of multiple complementary transition seeds, which have better performance (sensitivity per run time) than previous seeding strategies. Often the best seed patterns have more transition positions than those used previously. We also point out that recent computer memory sizes (e.g. 60 GB) make it feasible to use multiple (e.g. eight) seeds for whole mammal genomes. Interestingly, the most sensitive settings achieve diminishing returns for human–dog and melanogaster–pseudoobscura comparisons, but not for human–mouse, which suggests that we still miss many human–mouse alignments. Our optimized heuristics find ~20 000 new human–mouse alignments that are missing from the standard UCSC alignments. We tabulate seed patterns and parameters that work well so they can be used in future research.
    Keywords: Computational Methods, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2013-04-13
    Description: : Many high-throughput sequencing experiments produce paired DNA reads. Paired-end DNA reads provide extra positional information that is useful in reliable mapping of short reads to a reference genome, as well as in downstream analyses of structural variations. Given the importance of paired-end alignments, it is surprising that there have been no previous publications focusing on this topic. In this article, we present a new probabilistic framework to predict the alignment of paired-end reads to a reference genome. Using both simulated and real data, we compare the performance of our method with six other read-mapping tools that provide a paired-end option. We show that our method provides a good combination of accuracy, error rate and computation time, especially in more challenging and practical cases, such as when the reference genome is incomplete or unavailable for the sample, or when there are large variations between the reference genome and the source of the reads. An open-source implementation of our method is available as part of Last, a multi-purpose alignment program freely available at http://last.cbrc.jp . Contact: martin@cbrc.jp Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2012-11-29
    Description: Two methods to add unaligned sequences into an existing multiple sequence alignment have been implemented as the ‘ –add ’ and ‘ –addfragments ’ options in the MAFFT package. The former option is a basic one and applicable only to full-length sequences, whereas the latter option is applicable even when the unaligned sequences are short and fragmentary. These methods internally infer the phylogenetic relationship among the sequences in the existing alignment and the phylogenetic positions of unaligned sequences. Benchmarks based on two independent simulations consistently suggest that the " –addfragments " option outperforms recent methods, PaPaRa and PAGAN, in accuracy for difficult problems and that these three methods appropriately handle easy problems. Availability: http://mafft.cbrc.jp/alignment/software/ Contact: katoh@ifrec.osaka-u.ac.jp Supplementary information: Supplementary data are available at Bioinformatics online
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2016-01-10
    Description: Motivation: Pairwise local alignment is an indispensable tool for molecular biologists. In real time (i.e. in about 1 s), ALP (Ascending Ladder Program) calculates the E -values for protein–protein or DNA–DNA local alignments of random sequences, for arbitrary substitution score matrix, gap costs and letter abundances; and FALP (Frameshift Ascending Ladder Program) performs a similar task, although more slowly, for frameshifting DNA-protein alignments. Availability and implementation: To permit other C++ programmers to implement the computational efficiencies in ALP and FALP directly within their own programs, C++ source codes are available in the public domain at http://go.usa.gov/3GTSW under ‘ALP’ and ‘FALP’, along with the standalone programs ALP and FALP. Contact: spouge@nih.gov Supplementary information: Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...