ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

101

Unbekannt

Identifying protein interaction subnetworks by a bagging Markov random field-based method (2013)

Chen, L., Xuan, J., Riggins, R. B., Wang, Y., Clarke, R.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-01-20

Beschreibung: Identification of differentially expressed subnetworks from protein–protein interaction (PPI) networks has become increasingly important to our global understanding of the molecular mechanisms that drive cancer. Several methods have been proposed for PPI subnetwork identification, but the dependency among network member genes is not explicitly considered, leaving many important hub genes largely unidentified. We present a new method, based on a bagging Markov random field (BMRF) framework, to improve subnetwork identification for mechanistic studies of breast cancer. The method follows a maximum a posteriori principle to form a novel network score that explicitly considers pairwise gene interactions in PPI networks, and it searches for subnetworks with maximal network scores. To improve their robustness across data sets, a bagging scheme based on bootstrapping samples is implemented to statistically select high confidence subnetworks. We first compared the BMRF-based method with existing methods on simulation data to demonstrate its improved performance. We then applied our method to breast cancer data to identify PPI subnetworks associated with breast cancer progression and/or tamoxifen resistance. The experimental results show that not only an improved prediction performance can be achieved by the BMRF approach when tested on independent data sets, but biologically meaningful subnetworks can also be revealed that are relevant to breast cancer and tamoxifen resistance.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

102

Unbekannt

miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data (2013)

An, J., Lai, J., Lehman, M. L., Nelson, C. C.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-01-20

Beschreibung: miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

103

Unbekannt

Aly and THO are required for assembly of the human TREX complex and association of TREX components with the spliced mRNA (2013)

Chi, B., Wang, Q., Wu, G., Tan, M., Wang, L., Shi, M., Chang, X., Cheng, H.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-01-20

Beschreibung: The mRNA export complex TREX (TREX) is known to contain Aly, UAP56, Tex1 and the THO complex, among which UAP56 is required for TREX assembly. Here, we systematically investigated the role of each human TREX component in TREX assembly and its association with the mRNA. We found that Tex1 is essentially a subunit of the THO complex. Aly, THO and UAP56 are all required for assembly of TREX, in which Aly directly interacts with THO subunits Thoc2 and Thoc5. Both Aly and THO function in linking UAP56 to the cap-binding protein CBP80. Interestingly, association of UAP56 with the spliced mRNA, but not with the pre-mRNA, requires Aly and THO. Unexpectedly, we found that Aly and THO require each other to associate with the spliced mRNA. Consistent with these biochemical results, similar to Aly and UAP56, THO plays critical roles in mRNA export. Together, we propose that Aly, THO and UAP56 form a highly integrated unit to associate with the spliced mRNA and function in mRNA export.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

104

Unbekannt

GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data (2013)

Benoukraf, T., Wongphayak, S., Hadi, L. H. A., Wu, M., Soong, R.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-20

Beschreibung: High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA— http://ctrad-csi.nus.edu.sg/gbsa ), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

105

Unbekannt

ellipsoidFN: a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions (2013)

Ren, X., Wang, Y., Chen, L., Zhang, X.-S., Jin, Q.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-20

Beschreibung: Computationally identifying effective biomarkers for cancers from gene expression profiles is an important and challenging task. The challenge lies in the complicated pathogenesis of cancers that often involve the dysfunction of many genes and regulatory interactions. Thus, sophisticated classification model is in pressing need. In this study, we proposed an efficient approach, called ellipsoidFN (ellipsoid Feature Net), to model the disease complexity by ellipsoids and seek a set of heterogeneous biomarkers. Our approach achieves a non-linear classification scheme for the mixed samples by the ellipsoid concept, and at the same time uses a linear programming framework to efficiently select biomarkers from high-dimensional space. ellipsoidFN reduces the redundancy and improves the complementariness between the identified biomarkers, thus significantly enhancing the distinctiveness between cancers and normal samples, and even between cancer types. Numerical evaluation on real prostate cancer, breast cancer and leukemia gene expression datasets suggested that ellipsoidFN outperforms the state-of-the-art biomarker identification methods, and it can serve as a useful tool for cancer biomarker identification in the future. The Matlab code of ellipsoidFN is freely available from http://doc.aporc.org/wiki/EllipsoidFN .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

106

Unbekannt

Volatility in mRNA secondary structure as a design principle for antisense (2013)

Johnson, E., Srivastava, R.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: Designing effective antisense sequences is a formidable problem. A method for predicting efficacious antisense holds the potential to provide fundamental insight into this biophysical process. More practically, such an understanding increases the chance of successful antisense design as well as saving considerable time, money and labor. The secondary structure of an mRNA molecule is believed to be in a constant state of flux, sampling several different suboptimal states. We hypothesized that particularly volatile regions might provide better accessibility for antisense targeting. A computational framework, GenAVERT was developed to evaluate this hypothesis. GenAVERT used UNAFold and RNAforester to generate and compare the predicted suboptimal structures of mRNA sequences. Subsequent analysis revealed regions that were particularly volatile in terms of intramolecular hydrogen bonding, and thus potentially superior antisense targets due to their high accessibility. Several mRNA sequences with known natural antisense target sites as well as artificial antisense target sites were evaluated. Upon comparison, antisense sequences predicted based upon the volatility hypothesis closely matched those of the naturally occurring antisense, as well as those artificial target sites that provided efficient down-regulation. These results suggest that this strategy may provide a powerful new approach to antisense design.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

107

Unbekannt

EMdeCODE: a novel algorithm capable of reading words of epigenetic code to predict enhancers and retroviral integration sites and to identify H3R2me1 as a distinctive mark of coding versus non-coding genes (2013)

Santoni, F. A.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: Existence of some extra-genetic (epigenetic) codes has been postulated since the discovery of the primary genetic code. Evident effects of histone post-translational modifications or DNA methylation over the efficiency and the regulation of DNA processes are supporting this postulation. EMdeCODE is an original algorithm that approximate the genomic distribution of given DNA features (e.g. promoter, enhancer, viral integration) by identifying relevant ChIPSeq profiles of post-translational histone marks or DNA binding proteins and combining them in a supermark. EMdeCODE kernel is essentially a two-step procedure: (i) an expectation-maximization process calculates the mixture of epigenetic factors that maximize the Sensitivity (recall) of the association with the feature under study; (ii) the approximated density is then recursively trimmed with respect to a control dataset to increase the precision by reducing the number of false positives. EMdeCODE densities improve significantly the prediction of enhancer loci and retroviral integration sites with respect to previous methods. Importantly, it can also be used to extract distinctive factors between two arbitrary conditions. Indeed EMdeCODE identifies unexpected epigenetic profiles specific for coding versus non-coding RNA, pointing towards a new role for H3R2me1 in coding regions.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

108

Unbekannt

A population model for genotyping indels from next-generation sequence data (2013)

Shao, H., Bellos, E., Yin, H., Liu, X., Zou, J., Li, Y., Wang, J., Coin, L. J. M.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: Insertion and deletion polymorphisms (indels) are an important source of genomic variation in plant and animal genomes, but accurate genotyping from low-coverage and exome next-generation sequence data remains challenging. We introduce an efficient population clustering algorithm for diploids and polyploids which was tested on a dataset of 2000 exomes. Compared with existing methods, we report a 4-fold reduction in overall indel genotype error rates with a 9-fold reduction in low coverage regions.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

109

Unbekannt

PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION (2013)

Elati, M., Nicolle, R., Junier, I., Fernandez, D., Fekih, R., Font, J., Kepes, F.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli , respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

110

Unbekannt

miRNA target enrichment analysis reveals directly active miRNAs in health and disease (2013)

Steinfeld, I., Navon, R., Ach, R., Yakhini, Z.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: microRNAs (miRNAs) are short non-coding regulatory RNA molecules. The activity of a miRNA in a biological process can often be reflected in the expression program that characterizes the outcome of the activity. We introduce a computational approach that infers such activity from high-throughput data using a novel statistical methodology, called minimum-mHG (mmHG), that examines mutual enrichment in two ranked lists. Based on this methodology, we provide a user-friendly web application that supports the statistical assessment of miRNA target enrichment analysis (miTEA) in the top of a ranked list of genes or proteins. Using miTEA, we analyze several target prediction tools by examining performance on public miRNA constitutive expression data. We also apply miTEA to analyze several integrative biology data sets, including a novel matched miRNA/mRNA data set covering nine human tissue types. Our novel findings include proposed direct activity of miR-519 in placenta, a direct activity of the oncogenic miR-15 in different healthy tissue types and a direct activity of the poorly characterized miR-768 in both healthy tissue types and cancer cell lines. The miTEA web application is available at http://cbl-gorilla.cs.technion.ac.il/miTEA/ .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

111

Unbekannt

The twilight zone of cis element alignments (2013)

Sebastian, A., Contreras-Moreira, B.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare ( http://floresta.eead.csic.es/tfcompare ), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

112

Unbekannt

TEAK: Topology Enrichment Analysis frameworK for detecting activated biological subpathways (2013)

Judeh, T., Johnson, C., Kumar, A., Zhu, D.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: To mine gene expression data sets effectively, analysis frameworks need to incorporate methods that identify intergenic relationships within enriched biologically relevant subpathways. For this purpose, we developed the Topology Enrichment Analysis frameworK (TEAK). TEAK employs a novel in-house algorithm and a tailor-made Clique Percolation Method to extract linear and nonlinear KEGG subpathways, respectively. TEAK scores subpathways using the Bayesian Information Criterion for context specific data and the Kullback-Leibler divergence for case–control data. In this article, we utilized TEAK with experimental studies to analyze microarray data sets profiling stress responses in the model eukaryote Saccharomyces cerevisiae . Using a public microarray data set, we identified via TEAK linear sphingolipid metabolic subpathways activated during the yeast response to nitrogen stress, and phenotypic analyses of the corresponding deletion strains indicated previously unreported fitness defects for the dpl1 and lag1 mutants under conditions of nitrogen limitation. In addition, we studied the yeast filamentous response to nitrogen stress by profiling changes in transcript levels upon deletion of two key filamentous growth transcription factors, FLO8 and MSS11 . Via TEAK we identified a nonlinear glycerophospholipid metabolism subpathway involving the SLC1 gene, which we found via mutational analysis to be required for yeast filamentous growth.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

113

Unbekannt

Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation (2013)

Szatkiewicz, J. P., Wang, W., Sullivan, P. F., Wang, W., Sun, W.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-02-02

Beschreibung: Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth–based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth–based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

114

Unbekannt

A scale-space method for detecting recurrent DNA copy number changes with analytical false discovery rate control (2013)

van Dyk, E., Reinders, M. J. T., Wessels, L. F. A.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-05-04

Beschreibung: Tumor formation is partially driven by DNA copy number changes, which are typically measured using array comparative genomic hybridization, SNP arrays and DNA sequencing platforms. Many techniques are available for detecting recurring aberrations across multiple tumor samples, including CMAR, STAC, GISTIC and KC-SMART. GISTIC is widely used and detects both broad and focal (potentially overlapping) recurring events. However, GISTIC performs false discovery rate control on probes instead of events. Here we propose Analytical Multi-scale Identification of Recurrent Events, a multi-scale Gaussian smoothing approach, for the detection of both broad and focal (potentially overlapping) recurring copy number alterations. Importantly, false discovery rate control is performed analytically (no need for permutations) on events rather than probes. The method does not require segmentation or calling on the input dataset and therefore reduces the potential loss of information due to discretization. An important characteristic of the approach is that the error rate is controlled across all scales and that the algorithm outputs a single profile of significant events selected from the appropriate scales. We perform extensive simulations and showcase its utility on a glioblastoma SNP array dataset. Importantly, ADMIRE detects focal events that are missed by GISTIC, including two events involving known glioma tumor-suppressor genes: CDKN2C and NF1.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

115

Unbekannt

Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential (2013)

Tripathi, S., Glazko, G. V., Emmert-Streib, F.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2013-04-14

Beschreibung: In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

116

Unbekannt

PanOCT: automated clustering of orthologs using conserved gene neighborhood for pan-genomic analysis of bacterial strains and closely related species (2012)

Fouts, D. E., Brinkac, L., Beck, E., Inman, J., Sutton, G.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-12-14

Beschreibung: Pan-genome ortholog clustering tool ( PanOCT ) is a tool for pan-genomic analysis of closely related prokaryotic species or strains. PanOCT uses conserved gene neighborhood information to separate recently diverged paralogs into orthologous clusters where homology-only clustering methods cannot. The results from PanOCT and three commonly used graph-based ortholog-finding programs were compared using a set of four publicly available strains of the same bacterial species. All four methods agreed on ~70% of the clusters and ~86% of the proteins. The clusters that did not agree were inspected for evidence of correctness resulting in 85 high-confidence manually curated clusters that were used to compare all four methods.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

117

Unbekannt

A novel ab initio identification system of transcriptional regulation motifs in genome DNA sequences based on direct comparison scheme of signal/noise distributions (2012)

Nakaki, R., Kang, J., Tateno, M.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-10-10

Beschreibung: A novel ab initio parameter-tuning-free system to identify transcriptional factor (TF) binding motifs (TFBMs) in genome DNA sequences was developed. It is based on the comparison of two types of frequency distributions with respect to the TFBM candidates in the target DNA sequences and the non-candidates in the background sequence, with the latter generated by utilizing the intergenic sequences. For benchmark tests, we used DNA sequence datasets extracted by ChIP-on-chip and ChIP-seq techniques and identified 65 yeast and four mammalian TFBMs, with the latter including gaps. The accuracy of our system was compared with those of other available programs (i.e. MEME, Weeder, BioProspector, MDscan and DME) and was the best among them, even without tuning of the parameter set for each TFBM and pre-treatment/editing of the target DNA sequences. Moreover, with respect to some TFs for which the identified motifs are inconsistent with those in the references, our results were revealed to be correct, by comparing them with other existing experimental data. Thus, our identification system does not need any other biological information except for gene positions, and is also expected to be applicable to genome DNA sequences to identify unknown TFBMs as well as known ones.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

118

Unbekannt

Identification of new microRNA-regulated genes by conserved targeting in plant species (2012)

Chorostecki, U., Crosa, V. A., Lodeyro, A. F., Bologna, N. G., Martin, A. P., Carrillo, N., Schommer, C., Palatnik, J. F.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-10-10

Beschreibung: MicroRNAs (miRNAs) are major regulators of gene expression in multicellular organisms. They recognize their targets by sequence complementarity and guide them to cleavage or translational arrest. It is generally accepted that plant miRNAs have extensive complementarity to their targets and their prediction usually relies on the use of empirical parameters deduced from known miRNA–target interactions. Here, we developed a strategy to identify miRNA targets which is mainly based on the conservation of the potential regulation in different species. We applied the approach to expressed sequence tags datasets from angiosperms. Using this strategy, we predicted many new interactions and experimentally validated previously unknown miRNA targets in Arabidopsis thaliana . Newly identified targets that are broadly conserved include auxin regulators, transcription factors and transporters. Some of them might participate in the same pathways as the targets known before, suggesting that some miRNAs might control different aspects of a biological process. Furthermore, this approach can be used to identify targets present in a specific group of species, and, as a proof of principle, we analyzed Solanaceae -specific targets. The presented strategy can be used alone or in combination with other approaches to find miRNA targets in plants.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

119

Unbekannt

Use of ChIP-Seq data for the design of a multiple promoter-alignment method (2012)

Erb, I., Gonzalez-Vallinas, J. R., Bussotti, G., Blanco, E., Eyras, E., Notredame, C.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-04-15

Beschreibung: We address the challenge of regulatory sequence alignment with a new method, Pro-Coffee, a multiple aligner specifically designed for homologous promoter regions. Pro-Coffee uses a dinucleotide substitution matrix estimated on alignments of functional binding sites from TRANSFAC. We designed a validation framework using several thousand families of orthologous promoters. This dataset was used to evaluate the accuracy for predicting true human orthologs among their paralogs. We found that whereas other methods achieve on average 73.5% accuracy, and 77.6% when trained on that same dataset, the figure goes up to 80.4% for Pro-Coffee. We then applied a novel validation procedure based on multi-species ChIP-seq data. Trained and untrained methods were tested for their capacity to correctly align experimentally detected binding sites. Whereas the average number of correctly aligned sites for two transcription factors is 284 for default methods and 316 for trained methods, Pro-Coffee achieves 331, 16.5% above the default average. We find a high correlation between a method's performance when classifying orthologs and its ability to correctly align proven binding sites. Not only has this interesting biological consequences, it also allows us to conclude that any method that is trained on the ortholog data set will result in functionally more informative alignments.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

120

Unbekannt

MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity (2012)

Wang, Y., Tang, H., De; Barry, J. D., Tan, X., Li, J., Wang, X., Lee, T.-h., Jin, H., Marler, B., Guo, H., Kissinger, J. C., Paterson, A. H.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-04-15

Beschreibung: MCScan is an algorithm able to scan multiple genomes or subgenomes in order to identify putative homologous chromosomal regions, and align these regions using genes as anchors. The MCScanX toolkit implements an adjusted MCScan algorithm for detection of synteny and collinearity that extends the original software by incorporating 14 utility programs for visualization of results and additional downstream analyses. Applications of MCScanX to several sequenced plant genomes and gene families are shown as examples. MCScanX can be used to effectively analyze chromosome structural changes, and reveal the history of gene family expansions that might contribute to the adaptation of lineages and taxa. An integrated view of various modes of gene duplication can supplement the traditional gene tree analysis in specific families. The source code and documentation of MCScanX are freely available at http://chibba.pgml.uga.edu/mcscan2/ .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

121

Unbekannt

A mostly traditional approach improves alignment of bisulfite-converted DNA (2012)

Frith, M. C., Mori, R., Asai, K.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-07-22

Beschreibung: Cytosines in genomic DNA are sometimes methylated. This affects many biological processes and diseases. The standard way of measuring methylation is to use bisulfite, which converts unmethylated cytosines to thymines, then sequence the DNA and compare it to a reference genome sequence. We describe a method for the critical step of aligning the DNA reads to the correct genomic locations. Our method builds on classic alignment techniques, including likelihood-ratio scores and spaced seeds. In a realistic benchmark, our method has a better combination of sensitivity, specificity and speed than nine other high-throughput bisulfite aligners. This study enables more accurate and rational analysis of DNA methylation. It also illustrates how to adapt general-purpose alignment methods to a special case with distorted base patterns: this should be informative for other special cases such as ancient DNA and AT-rich genomes.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

122

Unbekannt

PhiSpy: a novel algorithm for finding prophages in bacterial genomes that combines similarity- and composition-based strategies (2012)

Akhter, S., Aziz, R. K., Edwards, R. A.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-09-13

Beschreibung: Prophages are phages in lysogeny that are integrated into, and replicated as part of, the host bacterial genome. These mobile elements can have tremendous impact on their bacterial hosts’ genomes and phenotypes, which may lead to strain emergence and diversification, increased virulence or antibiotic resistance. However, finding prophages in microbial genomes remains a problem with no definitive solution. The majority of existing tools rely on detecting genomic regions enriched in protein-coding genes with known phage homologs, which hinders the de novo discovery of phage regions. In this study, a weighted phage detection algorithm, PhiSpy was developed based on seven distinctive characteristics of prophages, i.e. protein length, transcription strand directionality, customized AT and GC skew, the abundance of unique phage words, phage insertion points and the similarity of phage proteins. The first five characteristics are capable of identifying prophages without any sequence similarity with known phage genes. PhiSpy locates prophages by ranking genomic regions enriched in distinctive phage traits, which leads to the successful prediction of 94% of prophages in 50 complete bacterial genomes with a 6% false-negative rate and a 0.66% false-positive rate.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

123

Unbekannt

Exploiting mid-range DNA patterns for sequence classification: binary abstraction Markov models (2012)

Shepard, S. S., McSweeny, A., Serpen, G., Fedorov, A.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-06-06

Beschreibung: Messenger RNA sequences possess specific nucleotide patterns distinguishing them from non-coding genomic sequences. In this study, we explore the utilization of modified Markov models to analyze sequences up to 44 bp, far beyond the 8-bp limit of conventional Markov models, for exon/intron discrimination. In order to analyze nucleotide sequences of this length, their information content is first reduced by conversion into shorter binary patterns via the application of numerous abstraction schemes. After the conversion of genomic sequences to binary strings, homogenous Markov models trained on the binary sequences are used to discriminate between exons and introns. We term this approach the Binary Abstraction Markov Model (BAMM). High-quality abstraction schemes for exon/intron discrimination are selected using optimization algorithms on supercomputers. The best MM classifiers are then combined using support vector machines into a single classifier. With this approach, over 95% classification accuracy is achieved without taking reading frame into account. With further development, the BAMM approach can be applied to sequences lacking the genetic code such as ncRNAs and 5'-untranslated regions.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

124

Unbekannt

New methods for finding common insertion sites and co-occurring common insertion sites in transposon- and virus-based genetic screens (2012)

Bergemann, T. L., Starr, T. K., Yu, H., Steinbach, M., Erdmann, J., Chen, Y., Cormier, R. T., Largaespada, D. A., Silverstein, K. A. T.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-05-13

Beschreibung: Insertional mutagenesis screens in mice are used to identify individual genes that drive tumor formation. In these screens, candidate cancer genes are identified if their genomic location is proximal to a common insertion site (CIS) defined by high rates of transposon or retroviral insertions in a given genomic window. In this article, we describe a new method for defining CISs based on a Poisson distribution, the Poisson Regression Insertion Model, and show that this new method is an improvement over previously described methods. We also describe a modification of the method that can identify pairs and higher orders of co-occurring common insertion sites. We apply these methods to two data sets, one generated in a transposon-based screen for gastrointestinal tract cancer genes and another based on the set of retroviral insertions in the Retroviral Tagged Cancer Gene Database. We show that the new methods identify more relevant candidate genes and candidate gene pairs than found using previous methods. Identification of the biologically relevant set of mutations that occur in a single cell and cause tumor progression will aid in the rational design of single and combinatorial therapies in the upcoming age of personalized cancer therapy.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

125

Unbekannt

Efficient large-scale protein sequence comparison and gene matching to identify orthologs and co-orthologs (2012)

Mahmood, K., Webb, G. I., Song, J., Whisstock, J. C., Konagurthu, A. S.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-03-29

Beschreibung: Broadly, computational approaches for ortholog assignment is a three steps process: (i) identify all putative homologs between the genomes, (ii) identify gene anchors and (iii) link anchors to identify best gene matches given their order and context. In this article, we engineer two methods to improve two important aspects of this pipeline [specifically steps (ii) and (iii)]. First, computing sequence similarity data [step (i)] is a computationally intensive task for large sequence sets, creating a bottleneck in the ortholog assignment pipeline. We have designed a fast and highly scalable sort-join method (afree) based on k -mer counts to rapidly compare all pairs of sequences in a large protein sequence set to identify putative homologs. Second, availability of complex genomes containing large gene families with prevalence of complex evolutionary events, such as duplications, has made the task of assigning orthologs and co-orthologs difficult. Here, we have developed an iterative graph matching strategy where at each iteration the best gene assignments are identified resulting in a set of orthologs and co-orthologs. We find that the afree algorithm is faster than existing methods and maintains high accuracy in identifying similar genes. The iterative graph matching strategy also showed high accuracy in identifying complex gene relationships. Standalone afree available from http://vbc.med.monash.edu.au/~kmahmood/afree . EGM2, complete ortholog assignment pipeline (including afree and the iterative graph matching method) available from http://vbc.med.monash.edu.au/~kmahmood/EGM2 .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

126

Unbekannt

The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process (2012)

Heinrich, V., Stange, J., Dickhaus, T., Imkeller, P., Kruger, U., Bauer, S., Mundlos, S., Robinson, P. N., Hecht, J., Krawitz, P. M.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-03-29

Beschreibung: With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

127

Unbekannt

Unsupervised discovery of microbial population structure within metagenomes using nucleotide base composition (2012)

Saeed, I., Tang, S.-L., Halgamuge, S. K.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-03-14

Beschreibung: An approach to infer the unknown microbial population structure within a metagenome is to cluster nucleotide sequences based on common patterns in base composition, otherwise referred to as binning. When functional roles are assigned to the identified populations, a deeper understanding of microbial communities can be attained, more so than gene-centric approaches that explore overall functionality. In this study, we propose an unsupervised, model-based binning method with two clustering tiers, which uses a novel transformation of the oligonucleotide frequency-derived error gradient and GC content to generate coarse groups at the first tier of clustering; and tetranucleotide frequency to refine these groups at the secondary clustering tier. The proposed method has a demonstrated improvement over PhyloPythia, S-GSOM, TACOA and TaxSOM on all three benchmarks that were used for evaluation in this study. The proposed method is then applied to a pyrosequenced metagenomic library of mud volcano sediment sampled in southwestern Taiwan, with the inferred population structure validated against complementary sequencing of 16S ribosomal RNA marker genes. Finally, the proposed method was further validated against four publicly available metagenomes, including a highly complex Antarctic whale-fall bone sample, which was previously assumed to be too complex for binning prior to functional analysis.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

128

Unbekannt

NTRFinder: a software tool to find nested tandem repeats (2012)

Matroud, A. A., Hendy, M. D., Tuffley, C. P.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-02-17

Beschreibung: We introduce the software tool NTRFinder to search for a complex repetitive structure in DNA we call a nested tandem repeat (NTR). An NTR is a recurrence of two or more distinct tandem motifs interspersed with each other. We propose that NTRs can be used as phylogenetic and population markers. We have tested our algorithm on both real and simulated data, and present some real NTRs of interest. NTRFinder can be downloaded from http://www.maths.otago.ac.nz/~aamatroud/ .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

129

Unbekannt

CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation (2012)

Nikulova, A. A., Favorov, A. V., Sutormin, R. A., Makeev, V. J., Mironov, A. A.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-06-28

Beschreibung: Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory ‘grammar’, or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

130

Unbekannt

i-cisTarget: an integrative genomics method for the prediction of regulatory features and cis-regulatory modules (2012)

Herrmann, C., Van de Sande, B., Potier, D., Aerts, S.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-08-23

Beschreibung: The field of regulatory genomics today is characterized by the generation of high-throughput data sets that capture genome-wide transcription factor (TF) binding, histone modifications, or DNAseI hypersensitive regions across many cell types and conditions. In this context, a critical question is how to make optimal use of these publicly available datasets when studying transcriptional regulation. Here, we address this question in Drosophila melanogaster for which a large number of high-throughput regulatory datasets are available. We developed i-cisTarget (where the ‘ i ’ stands for integrative ), for the first time enabling the discovery of different types of enriched ‘regulatory features’ in a set of co-regulated sequences in one analysis, being either TF motifs or ‘ in vivo ’ chromatin features, or combinations thereof. We have validated our approach on 15 co-expressed gene sets, 21 ChIP data sets, 628 curated gene sets and multiple individual case studies, and show that meaningful regulatory features can be confidently discovered; that bona fide enhancers can be identified, both by in vivo events and by TF motifs; and that combinations of in vivo events and TF motifs further increase the performance of enhancer prediction.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

131

Unbekannt

Analyzing large biological datasets with association networks (2012)

Karpinets, T. V., Park, B. H., Uberbacher, E. C.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-09-27

Beschreibung: Due to advances in high-throughput biotechnologies biological information is being collected in databases at an amazing rate, requiring novel computational approaches that process collected data into new knowledge in a timely manner. In this study, we propose a computational framework for discovering modular structure, relationships and regularities in complex data. The framework utilizes a semantic-preserving vocabulary to convert records of biological annotations of an object, such as an organism, gene, chemical or sequence, into networks (Anets) of the associated annotations. An association between a pair of annotations in an Anet is determined by the similarity of their co-occurrence pattern with all other annotations in the data. This feature captures associations between annotations that do not necessarily co-occur with each other and facilitates discovery of the most significant relationships in the collected data through clustering and visualization of the Anet. To demonstrate this approach, we applied the framework to the analysis of metadata from the Genomes OnLine Database and produced a biological map of sequenced prokaryotic organisms with three major clusters of metadata that represent pathogens, environmental isolates and plant symbionts.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

132

Unbekannt

Integrative analysis of gene and miRNA expression profiles with transcription factor-miRNA feed-forward loops identifies regulators in human cancers (2012)

Yan, Z., Shah, P. K., Amin, S. B., Samur, M. K., Huang, N., Wang, X., Misra, V., Ji, H., Gabuzda, D., Li, C.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-09-27

Beschreibung: We describe here a novel method for integrating gene and miRNA expression profiles in cancer using feed-forward loops (FFLs) consisting of transcription factors (TFs), miRNAs and their common target genes. The dChip-GemiNI (Gene and miRNA Network-based Integration) method statistically ranks computationally predicted FFLs by their explanatory power to account for differential gene and miRNA expression between two biological conditions such as normal and cancer. GemiNI integrates not only gene and miRNA expression data but also computationally derived information about TF–target gene and miRNA–mRNA interactions. Literature validation shows that the integrated modeling of expression data and FFLs better identifies cancer-related TFs and miRNAs compared to existing approaches. We have utilized GemiNI for analyzing six data sets of solid cancers (liver, kidney, prostate, lung and germ cell) and found that top-ranked FFLs account for ~20% of transcriptome changes between normal and cancer. We have identified common FFL regulators across multiple cancer types, such as known FFLs consisting of MYC and miR-15/miR-17 families, and novel FFLs consisting of ARNT, CREB1 and their miRNA partners. The results and analysis web server are available at http://www.canevolve.org/dChip-GemiNi .

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

133

Unbekannt

Discovery of multi-dimensional modules by integrative analysis of cancer genomic data (2012)

Zhang, S., Liu, C.-C., Li, W., Shen, H., Laird, P. W., Zhou, X. J.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-10-24

Beschreibung: Recent technology has made it possible to simultaneously perform multi-platform genomic profiling (e.g. DNA methylation (DM) and gene expression (GE)) of biological samples, resulting in so-called ‘multi-dimensional genomic data’. Such data provide unique opportunities to study the coordination between regulatory mechanisms on multiple levels. However, integrative analysis of multi-dimensional genomics data for the discovery of combinatorial patterns is currently lacking. Here, we adopt a joint matrix factorization technique to address this challenge. This method projects multiple types of genomic data onto a common coordinate system, in which heterogeneous variables weighted highly in the same projected direction form a multi-dimensional module (md-module). Genomic variables in such modules are characterized by significant correlations and likely functional associations. We applied this method to the DM, GE, and microRNA expression data of 385 ovarian cancer samples from the The Cancer Genome Atlas project. These md-modules revealed perturbed pathways that would have been overlooked with only a single type of data, uncovered associations between different layers of cellular activities and allowed the identification of clinically distinct patient subgroups. Our study provides an useful protocol for uncovering hidden patterns and their biological implications in multi-dimensional ‘omic’ data.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

134

Unbekannt

Detection of dispersed short tandem repeats using reversible jump Markov chain Monte Carlo (2012)

Liang, T., Fan, X., Li, Q., Li, S.-y. R.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-10-24

Beschreibung: Tandem repeats occur frequently in biological sequences. They are important for studying genome evolution and human disease. A number of methods have been designed to detect a single tandem repeat in a sliding window. In this article, we focus on the case that an unknown number of tandem repeat segments of the same pattern are dispersively distributed in a sequence. We construct a probabilistic generative model for the tandem repeats, where the sequence pattern is represented by a motif matrix. A Bayesian approach is adopted to compute this model. Markov chain Monte Carlo (MCMC) algorithms are used to explore the posterior distribution as an effort to infer both the motif matrix of tandem repeats and the location of repeat segments. Reversible jump Markov chain Monte Carlo (RJMCMC) algorithms are used to address the transdimensional model selection problem raised by the variable number of repeat segments. Experiments on both synthetic data and real data show that this new approach is powerful in detecting dispersed short tandem repeats. As far as we know, it is the first work to adopt RJMCMC algorithms in the detection of tandem repeats.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

135

Unbekannt

Discovering the hidden sub-network component in a ranked list of genes or proteins derived from genomic experiments (2012)

Garcia-Alonso, L., Alonso, R., Vidal, E., Amadoz, A., de Maria, A., Minguez, P., Medina, I., Dopazo, J.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-11-04

Beschreibung: Genomic experiments (e.g. differential gene expression, single-nucleotide polymorphism association) typically produce ranked list of genes. We present a simple but powerful approach which uses protein–protein interaction data to detect sub-networks within such ranked lists of genes or proteins. We performed an exhaustive study of network parameters that allowed us concluding that the average number of components and the average number of nodes per component are the parameters that best discriminate between real and random networks. A novel aspect that increases the efficiency of this strategy in finding sub-networks is that, in addition to direct connections, also connections mediated by intermediate nodes are considered to build up the sub-networks. The possibility of using of such intermediate nodes makes this approach more robust to noise. It also overcomes some limitations intrinsic to experimental designs based on differential expression, in which some nodes are invariant across conditions. The proposed approach can also be used for candidate disease-gene prioritization. Here, we demonstrate the usefulness of the approach by means of several case examples that include a differential expression analysis in Fanconi Anemia, a genome-wide association study of bipolar disorder and a genome-scale study of essentiality in cancer genes. An efficient and easy-to-use web interface (available at http://www.babelomics.org ) based on HTML5 technologies is also provided to run the algorithm and represent the network.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

136

Unbekannt

MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads (2012)

Namiki, T., Hachiya, T., Tanaka, H., Sakakibara, Y.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-11-04

Beschreibung: An important step in ‘metagenomics’ analysis is the assembly of multiple genomes from mixed sequence reads of multiple species in a microbial community. Most conventional pipelines use a single-genome assembler with carefully optimized parameters. A limitation of a single-genome assembler for de novo metagenome assembly is that sequences of highly abundant species are likely misidentified as repeats in a single genome, resulting in a number of small fragmented scaffolds. We extended a single-genome assembler for short reads, known as ‘Velvet’, to metagenome assembly, which we called ‘MetaVelvet’, for mixed short reads of multiple species. Our fundamental concept was to first decompose a de Bruijn graph constructed from mixed short reads into individual sub-graphs, and second, to build scaffolds based on each decomposed de Bruijn sub-graph as an isolate species genome. We made use of two features, the coverage (abundance) difference and graph connectivity, for the decomposition of the de Bruijn graph. For simulated datasets, MetaVelvet succeeded in generating significantly higher N50 scores than any single-genome assemblers. MetaVelvet also reconstructed relatively low-coverage genome sequences as scaffolds. On real datasets of human gut microbial read data, MetaVelvet produced longer scaffolds and increased the number of predicted genes.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

137

Unbekannt

Repeat or not repeat?--Statistical validation of tandem repeat prediction in genomic sequences (2012)

Schaper, E., Kajava, A. V., Hauser, A., Anisimova, M.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-11-04

Beschreibung: Tandem repeats (TRs) represent one of the most prevalent features of genomic sequences. Due to their abundance and functional significance, a plethora of detection tools has been devised over the last two decades. Despite the longstanding interest, TR detection is still not resolved. Our large-scale tests reveal that current detectors produce different, often nonoverlapping inferences, reflecting characteristics of the underlying algorithms rather than the true distribution of TRs in genomic data. Our simulations show that the power of detecting TRs depends on the degree of their divergence, and repeat characteristics such as the length of the minimal repeat unit and their number in tandem. To reconcile the diverse predictions of current algorithms, we propose and evaluate several statistical criteria for measuring the quality of predicted repeat units. In particular, we propose a model-based phylogenetic classifier, entailing a maximum-likelihood estimation of the repeat divergence. Applied in conjunction with the state of the art detectors, our statistical classification scheme for inferred repeats allows to filter out false-positive predictions. Since different algorithms appear to specialize at predicting TRs with certain properties, we advise applying multiple detectors with subsequent filtering to obtain the most complete set of genuine repeats.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

138

Unbekannt

Context-specific microRNA analysis: identification of functional microRNAs and their mRNA targets (2012)

Bossel Ben-Moshe, N., Avraham, R., Kedmi, M., Zeisel, A., Yitzhaky, A., Yarden, Y., Domany, E.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-11-25

Beschreibung: MicroRNAs (miRs) function primarily as post-transcriptional negative regulators of gene expression through binding to their mRNA targets. Reliable prediction of a miR’s targets is a considerable bioinformatic challenge of great importance for inferring the miR’s function. Sequence-based prediction algorithms have high false-positive rates, are not in agreement, and are not biological context specific. Here we introduce CoSMic (Context-Specific MicroRNA analysis), an algorithm that combines sequence-based prediction with miR and mRNA expression data. CoSMic differs from existing methods—it identifies miRs that play active roles in the specific biological system of interest and predicts with less false positives their functional targets. We applied CoSMic to search for miRs that regulate the migratory response of human mammary cells to epidermal growth factor (EGF) stimulation. Several such miRs, whose putative targets were significantly enriched by migration processes were identified. We tested three of these miRs experimentally, and showed that they indeed affected the migratory phenotype; we also tested three negative controls. In comparison to other algorithms CoSMic indeed filters out false positives and allows improved identification of context-specific targets. CoSMic can greatly facilitate miR research in general and, in particular, advance our understanding of individual miRs’ function in a specific context.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

139

Unbekannt

SLiMPrints: conservation-based discovery of functional motif fingerprints in intrinsically disordered protein regions (2012)

Davey, N. E., Cowan, J. L., Shields, D. C., Gibson, T. J., Coldwell, M. J., Edwards, R. J.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-11-25

Beschreibung: Large portions of higher eukaryotic proteomes are intrinsically disordered, and abundant evidence suggests that these unstructured regions of proteins are rich in regulatory interaction interfaces. A major class of disordered interaction interfaces are the compact and degenerate modules known as short linear motifs (SLiMs). As a result of the difficulties associated with the experimental identification and validation of SLiMs, our understanding of these modules is limited, advocating the use of computational methods to focus experimental discovery. This article evaluates the use of evolutionary conservation as a discriminatory technique for motif discovery. A statistical framework is introduced to assess the significance of relatively conserved residues, quantifying the likelihood a residue will have a particular level of conservation given the conservation of the surrounding residues. The framework is expanded to assess the significance of groupings of conserved residues, a metric that forms the basis of SLiMPrints (short linear motif fingerprints), a de novo motif discovery tool. SLiMPrints identifies relatively overconstrained proximal groupings of residues within intrinsically disordered regions, indicative of putatively functional motifs. Finally, the human proteome is analysed to create a set of highly conserved putative motif instances, including a novel site on translation initiation factor eIF2A that may regulate translation through binding of eIF4E.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext

140

Unbekannt

Reconstructing dynamic gene regulatory networks from sample-based transcriptional data (2012)

Zhu, H., Rao, R. S. P., Zeng, T., Chen, L.

Oxford University Press

In: Nucleic Acids Research

zur Merkliste hinzufügen auf der Merkliste

Details

Publikationsdatum: 2012-11-25

Beschreibung: The current method for reconstructing gene regulatory networks faces a dilemma concerning the study of bio-medical problems. On the one hand, static approaches assume that genes are expressed in a steady state and thus cannot exploit and describe the dynamic patterns of an evolving process. On the other hand, approaches that can describe the dynamical behaviours require time-course data, which are normally not available in many bio-medical studies. To overcome the limitations of both the static and dynamic approaches, we propose a dynamic cascaded method (DCM) to reconstruct dynamic gene networks from sample-based transcriptional data. Our method is based on the intra-stage steady-rate assumption and the continuity assumption, which can properly characterize the dynamic and continuous nature of gene transcription in a biological process. Our simulation study showed that compared with static approaches, the DCM not only can reconstruct dynamical network but also can significantly improve network inference performance. We further applied our method to reconstruct the dynamic gene networks of hepatocellular carcinoma (HCC) progression. The derived HCC networks were verified by functional analysis and network enrichment analysis. Furthermore, it was shown that the modularity and network rewiring in the HCC networks can clearly characterize the dynamic patterns of HCC progression.

Schlagwort(e): Computational Methods, Genomics

Print ISSN: 0305-1048

Digitale ISSN: 1362-4962

Thema: Biologie

Publiziert von Oxford University Press

Permalink

	Standort	Signatur	Erwartet	Verfügbarkeit

Andere fanden auch interessant ...

AKTUELLE ARTIKEL

S·F·X

Volltext