ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unbekannt

Graph-based modeling of tandem repeats improves global multiple sequence alignment (2013)

Szalkowski, A. M., Anisimova, M.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-09-26

Beschreibung: Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

2

Unbekannt

Meander: visually exploring the structural variome using space-filling curves (2013)

Pavlopoulos, G. A., Kumar, P., Sifrim, A., Sakai, R., Lin, M. L., Voet, T., Moreau, Y., Aerts, J.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-06-08

Beschreibung: The introduction of next generation sequencing methods in genome studies has made it possible to shift research from a gene-centric approach to a genome wide view. Although methods and tools to detect single nucleotide polymorphisms are becoming more mature, methods to identify and visualize structural variation (SV) are still in their infancy. Most genome browsers can only compare a given sequence to a reference genome; therefore, direct comparison of multiple individuals still remains a challenge. Therefore, the implementation of efficient approaches to explore and visualize SVs and directly compare two or more individuals is desirable. In this article, we present a visualization approach that uses space-filling Hilbert curves to explore SVs based on both read-depth and pair-end information. An interactive open-source Java application, called Meander , implements the proposed methodology, and its functionality is demonstrated using two cases. With Meander , users can explore variations at different levels of resolution and simultaneously compare up to four different individuals against a common reference. The application was developed using Java version 1.6 and Processing.org and can be run on any platform. It can be found at http://homes.esat.kuleuven.be/~bioiuser/meander .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

3

Unbekannt

FOAM (Functional Ontology Assignments for Metagenomes): a Hidden Markov Model (HMM) database with environmental focus (2014)

Prestat, E., David, M. M., Hultman, J., Ta F;, N., Lamendella, R., Dvornik, J., Mackelprang, R., Myrold, D. D., Jumpponen, A., Tringe, S. G., Holman, E., Mavromatis, K., Jansson, J. K.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2014-11-07

Beschreibung: A new functional gene database, FOAM (Functional Ontology Assignments for Metagenomes), was developed to screen environmental metagenomic sequence datasets. FOAM provides a new functional ontology dedicated to classify gene functions relevant to environmental microorganisms based on Hidden Markov Models (HMMs). Sets of aligned protein sequences (i.e. ‘profiles’) were tailored to a large group of target KEGG Orthologs (KOs) from which HMMs were trained. The alignments were checked and curated to make them specific to the targeted KO. Within this process, sequence profiles were enriched with the most abundant sequences available to maximize the yield of accurate classifier models. An associated functional ontology was built to describe the functional groups and hierarchy. FOAM allows the user to select the target search space before HMM-based comparison steps and to easily organize the results into different functional categories and subcategories. FOAM is publicly available at http://portal.nersc.gov/project/m1317/FOAM/ .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

4

Unbekannt

svaseq: removing batch effects and other unwanted noise from sequencing data (2014)

Leek, J. T.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2014-11-28

Beschreibung: It is now known that unwanted noise and unmodeled artifacts such as batch effects can dramatically reduce the accuracy of statistical inference in genomic experiments. These sources of noise must be modeled and removed to accurately measure biological variability and to obtain correct statistical inference when performing high-throughput genomic analysis. We introduced surrogate variable analysis (sva) for estimating these artifacts by (i) identifying the part of the genomic data only affected by artifacts and (ii) estimating the artifacts with principal components or singular vectors of the subset of the data matrix. The resulting estimates of artifacts can be used in subsequent analyses as adjustment factors to correct analyses. Here I describe a version of the sva approach specifically created for count data or FPKMs from sequencing experiments based on appropriate data transformation. I also describe the addition of supervised sva (ssva) for using control probes to identify the part of the genomic data only affected by artifacts. I present a comparison between these versions of sva and other methods for batch effect estimation on simulated data, real count-based data and FPKM-based data. These updates are available through the sva Bioconductor package and I have made fully reproducible analysis using these methods available from: https://github.com/jtleek/svaseq .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

5

Unbekannt

Insyght: navigating amongst abundant homologues, syntenies and gene functional annotations in bacteria, it's that symbol! (2014)

Lacroix, T., Loux, V., Gendrault, A., Hoebeke, M., Gibrat, J.-F.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2014-11-28

Beschreibung: High-throughput techniques have considerably increased the potential of comparative genomics whilst simultaneously posing many new challenges. One of those challenges involves efficiently mining the large amount of data produced and exploring the landscape of both conserved and idiosyncratic genomic regions across multiple genomes. Domains of application of these analyses are diverse: identification of evolutionary events, inference of gene functions, detection of niche-specific genes or phylogenetic profiling. Insyght is a comparative genomic visualization tool that combines three complementary displays: (i) a table for thoroughly browsing amongst homologues, (ii) a comparator of orthologue functional annotations and (iii) a genomic organization view designed to improve the legibility of rearrangements and distinctive loci. The latter display combines symbolic and proportional graphical paradigms. Synchronized navigation across multiple species and interoperability between the views are core features of Insyght. A gene filter mechanism is provided that helps the user to build a biologically relevant gene set according to multiple criteria such as presence/absence of homologues and/or various annotations. We illustrate the use of Insyght with scenarios. Currently, only Bacteria and Archaea are supported. A public instance is available at http://genome.jouy.inra.fr/Insyght . The tool is freely downloadable for private data set analysis.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

6

Unbekannt

iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition (2014)

Lin, H., Deng, E.-Z., Ding, H., Chen, W., Chou, K.-C.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2014-11-28

Beschreibung: The 54 promoters are unique in prokaryotic genome and responsible for transcripting carbon and nitrogen-related genes. With the avalanche of genome sequences generated in the postgenomic age, it is highly desired to develop automated methods for rapidly and effectively identifying the 54 promoters. Here, a predictor called ‘ iPro54-PseKNC ’ was developed. In the predictor, the samples of DNA sequences were formulated by a novel feature vector called ‘pseudo k -tuple nucleotide composition’, which was further optimized by the incremental feature selection procedure. The performance of iPro54-PseKNC was examined by the rigorous jackknife cross-validation tests on a stringent benchmark data set. As a user-friendly web-server, iPro54-PseKNC is freely accessible at http://lin.uestc.edu.cn/server/iPro54-PseKNC . For the convenience of the vast majority of experimental scientists, a step-by-step protocol guide was provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented in this paper just for its integrity. Meanwhile, we also discovered through an in-depth statistical analysis that the distribution of distances between the transcription start sites and the translation initiation sites were governed by the gamma distribution, which may provide a fundamental physical principle for studying the 54 promoters.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

7

Unbekannt

Binding site discovery from nucleic acid sequences by discriminative learning of hidden Markov models (2014)

Maaskola, J., Rajewsky, N.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2014-11-28

Beschreibung: We present a discriminative learning method for pattern discovery of binding sites in nucleic acid sequences based on hidden Markov models. Sets of positive and negative example sequences are mined for sequence motifs whose occurrence frequency varies between the sets. The method offers several objective functions, but we concentrate on mutual information of condition and motif occurrence. We perform a systematic comparison of our method and numerous published motif-finding tools. Our method achieves the highest motif discovery performance, while being faster than most published methods. We present case studies of data from various technologies, including ChIP-Seq, RIP-Chip and PAR-CLIP, of embryonic stem cell transcription factors and of RNA-binding proteins, demonstrating practicality and utility of the method. For the alternative splicing factor RBM10, our analysis finds motifs known to be splicing-relevant. The motif discovery method is implemented in the free software package Discrover. It is applicable to genome- and transcriptome-scale data, makes use of available repeat experiments and aside from binary contrasts also more complex data configurations can be utilized.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

8

Unbekannt

Spatial localization of co-regulated genes exceeds genomic gene clustering in the Saccharomyces cerevisiae genome (2013)

Ben-Elazar, S., Yakhini, Z., Yanai, I.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-20

Beschreibung: While it has been long recognized that genes are not randomly positioned along the genome, the degree to which its 3D structure influences the arrangement of genes has remained elusive. In particular, several lines of evidence suggest that actively transcribed genes are spatially co-localized, forming transcription factories; however, a generalized systematic test has hitherto not been described. Here we reveal transcription factories using a rigorous definition of genomic structure based on Saccharomyces cerevisiae chromosome conformation capture data, coupled with an experimental design controlling for the primary gene order. We develop a data-driven method for the interpolation and the embedding of such datasets and introduce statistics that enable the comparison of the spatial and genomic densities of genes. Combining these, we report evidence that co-regulated genes are clustered in space, beyond their observed clustering in the context of gene order along the genome and show this phenomenon is significant for 64 out of 117 transcription factors. Furthermore, we show that those transcription factors with high spatially co-localized targets are expressed higher than those whose targets are not spatially clustered. Collectively, our results support the notion that, at a given time, the physical density of genes is intimately related to regulatory activity.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

9

Unbekannt

PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species (2012)

Fouts, D. E., Brinkac, L., Beck, E., Inman, J., Sutton, G.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-12-14

Beschreibung: Pan-genome ortholog clustering tool ( PanOCT ) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ~70% of the clusters and ~86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

10

Unbekannt

A novel ab initio identification system of transcriptional regulation motifs in genome DNA sequences based on direct comparison scheme of signal/noise distributions (2012)

Nakaki, R., Kang, J., Tateno, M.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-10-10

Beschreibung: A novel ab initio parameter-tuning-free system to identify transcriptional factor (TF) binding motifs (TFBMs) in genome DNA sequences was developed. It is based on the comparison of two types of frequency distributions with respect to the TFBM candidates in the target DNA sequences and the non-candidates in the background sequence, with the latter generated by utilizing the intergenic sequences. For benchmark tests, we used DNA sequence datasets extracted by ChIP-on-chip and ChIP-seq techniques and identified 65 yeast and four mammalian TFBMs, with the latter including gaps. The accuracy of our system was compared with those of other available programs (i.e. MEME, Weeder, BioProspector, MDscan and DME) and was the best among them, even without tuning of the parameter set for each TFBM and pre-treatment/editing of the target DNA sequences. Moreover, with respect to some TFs for which the identified motifs are inconsistent with those in the references, our results were revealed to be correct, by comparing them with other existing experimental data. Thus, our identification system does not need any other biological information except for gene positions, and is also expected to be applicable to genome DNA sequences to identify unknown TFBMs as well as known ones.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext