ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Graph-based modeling of tandem repeats improves global multiple sequence alignment (2013)

Szalkowski, A. M., Anisimova, M.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-09-26

Description: Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Meander: visually exploring the structural variome using space-filling curves (2013)

Pavlopoulos, G. A., Kumar, P., Sifrim, A., Sakai, R., Lin, M. L., Voet, T., Moreau, Y., Aerts, J.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-06-08

Description: The introduction of next generation sequencing methods in genome studies has made it possible to shift research from a gene-centric approach to a genome wide view. Although methods and tools to detect single nucleotide polymorphisms are becoming more mature, methods to identify and visualize structural variation (SV) are still in their infancy. Most genome browsers can only compare a given sequence to a reference genome; therefore, direct comparison of multiple individuals still remains a challenge. Therefore, the implementation of efficient approaches to explore and visualize SVs and directly compare two or more individuals is desirable. In this article, we present a visualization approach that uses space-filling Hilbert curves to explore SVs based on both read-depth and pair-end information. An interactive open-source Java application, called Meander , implements the proposed methodology, and its functionality is demonstrated using two cases. With Meander , users can explore variations at different levels of resolution and simultaneously compare up to four different individuals against a common reference. The application was developed using Java version 1.6 and Processing.org and can be run on any platform. It can be found at http://homes.esat.kuleuven.be/~bioiuser/meander .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Spatial localization of co-regulated genes exceeds genomic gene clustering in the Saccharomyces cerevisiae genome (2013)

Ben-Elazar, S., Yakhini, Z., Yanai, I.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-20

Description: While it has been long recognized that genes are not randomly positioned along the genome, the degree to which its 3D structure influences the arrangement of genes has remained elusive. In particular, several lines of evidence suggest that actively transcribed genes are spatially co-localized, forming transcription factories; however, a generalized systematic test has hitherto not been described. Here we reveal transcription factories using a rigorous definition of genomic structure based on Saccharomyces cerevisiae chromosome conformation capture data, coupled with an experimental design controlling for the primary gene order. We develop a data-driven method for the interpolation and the embedding of such datasets and introduce statistics that enable the comparison of the spatial and genomic densities of genes. Combining these, we report evidence that co-regulated genes are clustered in space, beyond their observed clustering in the context of gene order along the genome and show this phenomenon is significant for 64 out of 117 transcription factors. Furthermore, we show that those transcription factors with high spatially co-localized targets are expressed higher than those whose targets are not spatially clustered. Collectively, our results support the notion that, at a given time, the physical density of genes is intimately related to regulatory activity.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

DNA motif elucidation using belief propagation (2013)

Wong, K.-C., Chan, T.-M., Peng, C., Li, Y., Zhang, Z.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-09-06

Description: Protein-binding microarray (PBM) is a high-throughout platform that can measure the DNA-binding preference of a protein in a comprehensive and unbiased manner. A typical PBM experiment can measure binding signal intensities of a protein to all the possible DNA k-mers (k = 8 ~10); such comprehensive binding affinity data usually need to be reduced and represented as motif models before they can be further analyzed and applied. Since proteins can often bind to DNA in multiple modes, one of the major challenges is to decompose the comprehensive affinity data into multimodal motif representations. Here, we describe a new algorithm that uses Hidden Markov Models (HMMs) and can derive precise and multimodal motifs using belief propagations. We describe an HMM-based approach using belief propagations (kmerHMM), which accepts and preprocesses PBM probe raw data into median-binding intensities of individual k-mers. The k-mers are ranked and aligned for training an HMM as the underlying motif representation. Multiple motifs are then extracted from the HMM using belief propagations. Comparisons of kmerHMM with other leading methods on several data sets demonstrated its effectiveness and uniqueness. Especially, it achieved the best performance on more than half of the data sets. In addition, the multiple binding modes derived by kmerHMM are biologically meaningful and will be useful in interpreting other genome-wide data such as those generated from ChIP-seq. The executables and source codes are available at the authors’ websites: e.g. http://www.cs.toronto.edu/~wkc/kmerHMM .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Computational identification of functional introns: high positional conservation of introns that harbor RNA genes (2013)

Chorev, M., Carmel, L.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-06-08

Description: An appreciable fraction of introns is thought to have some function, but there is no obvious way to predict which specific intron is likely to be functional. We hypothesize that functional introns experience a different selection regime than non-functional ones and will therefore show distinct evolutionary histories. In particular, we expect functional introns to be more resistant to loss, and that this would be reflected in high conservation of their position with respect to the coding sequence. To test this hypothesis, we focused on introns whose function comes about from microRNAs and snoRNAs that are embedded within their sequence. We built a data set of orthologous genes across 28 eukaryotic species, reconstructed the evolutionary histories of their introns and compared functional introns with the rest of the introns. We found that, indeed, the position of microRNA- and snoRNA-bearing introns is significantly more conserved. In addition, we found that both families of RNA genes settled within introns early during metazoan evolution. We identified several easily computable intronic properties that can be used to detect functional introns in general, thereby suggesting a new strategy to pinpoint non-coding cellular functions.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Identifying subgroup markers in heterogeneous populations (2013)

de Ronde, J. J., Rigaill, G., Rottenberg, S., Rodenhuis, S., Wessels, L. F. A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-11-21

Description: Traditional methods that aim to identify biomarkers that distinguish between two groups, like Significance Analysis of Microarrays or the t -test, perform optimally when such biomarkers show homogeneous behavior within each group and differential behavior between the groups. However, in many applications, this is not the case. Instead, a subgroup of samples in one group shows differential behavior with respect to all other samples. To successfully detect markers showing such imbalanced patterns of differential signal, a different approach is required. We propose a novel method, specifically designed for the Detection of Imbalanced Differential Signal (DIDS). We use an artificial dataset and a human breast cancer dataset to measure its performance and compare it with three traditional methods and four approaches that take imbalanced signal into account. Supported by extensive experimental results, we show that DIDS outperforms all other approaches in terms of power and positive predictive value. In a mouse breast cancer dataset, DIDS is the only approach that detects a functionally validated marker of chemotherapy resistance. DIDS can be applied to any continuous value data, including gene expression data, and in any context where imbalanced differential signal is manifested.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Identifying protein interaction subnetworks by a bagging Markov random field-based method (2013)

Chen, L., Xuan, J., Riggins, R. B., Wang, Y., Clarke, R.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-01-20

Description: Identification of differentially expressed subnetworks from protein–protein interaction (PPI) networks has become increasingly important to our global understanding of the molecular mechanisms that drive cancer. Several methods have been proposed for PPI subnetwork identification, but the dependency among network member genes is not explicitly considered, leaving many important hub genes largely unidentified. We present a new method, based on a bagging Markov random field (BMRF) framework, to improve subnetwork identification for mechanistic studies of breast cancer. The method follows a maximum a posteriori principle to form a novel network score that explicitly considers pairwise gene interactions in PPI networks, and it searches for subnetworks with maximal network scores. To improve their robustness across data sets, a bagging scheme based on bootstrapping samples is implemented to statistically select high confidence subnetworks. We first compared the BMRF-based method with existing methods on simulation data to demonstrate its improved performance. We then applied our method to breast cancer data to identify PPI subnetworks associated with breast cancer progression and/or tamoxifen resistance. The experimental results show that not only an improved prediction performance can be achieved by the BMRF approach when tested on independent data sets, but biologically meaningful subnetworks can also be revealed that are relevant to breast cancer and tamoxifen resistance.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

miRDeep*: an integrated application tool for miRNA identification from RNA sequencing data (2013)

An, J., Lai, J., Lehman, M. L., Nelson, C. C.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-01-20

Description: miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Aly and THO are required for assembly of the human TREX complex and association of TREX components with the spliced mRNA (2013)

Chi, B., Wang, Q., Wu, G., Tan, M., Wang, L., Shi, M., Chang, X., Cheng, H.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-01-20

Description: The mRNA export complex TREX (TREX) is known to contain Aly, UAP56, Tex1 and the THO complex, among which UAP56 is required for TREX assembly. Here, we systematically investigated the role of each human TREX component in TREX assembly and its association with the mRNA. We found that Tex1 is essentially a subunit of the THO complex. Aly, THO and UAP56 are all required for assembly of TREX, in which Aly directly interacts with THO subunits Thoc2 and Thoc5. Both Aly and THO function in linking UAP56 to the cap-binding protein CBP80. Interestingly, association of UAP56 with the spliced mRNA, but not with the pre-mRNA, requires Aly and THO. Unexpectedly, we found that Aly and THO require each other to associate with the spliced mRNA. Consistent with these biochemical results, similar to Aly and UAP56, THO plays critical roles in mRNA export. Together, we propose that Aly, THO and UAP56 form a highly integrated unit to associate with the spliced mRNA and function in mRNA export.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

GBSA: a comprehensive software for analysing whole genome bisulfite sequencing data (2013)

Benoukraf, T., Wongphayak, S., Hadi, L. H. A., Wu, M., Soong, R.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-20

Description: High-throughput sequencing is increasingly being used in combination with bisulfite (BS) assays to study DNA methylation at nucleotide resolution. Although several programmes provide genome-wide alignment of BS-treated reads, the resulting information is not readily interpretable and often requires further bioinformatic steps for meaningful analysis. Current post-alignment BS-sequencing programmes are generally focused on the gene-specific level, a restrictive feature when analysis in the non-coding regions, such as enhancers and intergenic microRNAs, is required. Here, we present Genome Bisulfite Sequencing Analyser (GBSA— http://ctrad-csi.nus.edu.sg/gbsa ), a free open-source software capable of analysing whole-genome bisulfite sequencing data with either a gene-centric or gene-independent focus. Through analysis of the largest published data sets to date, we demonstrate GBSA’s features in providing sequencing quality assessment, methylation scoring, functional data management and visualization of genomic methylation at nucleotide resolution. Additionally, we show that GBSA’s output can be easily integrated with other high-throughput sequencing data, such as RNA-Seq or ChIP-seq, to elucidate the role of methylated intergenic regions in gene regulation. In essence, GBSA allows an investigator to explore not only known loci but also all the genomic regions, for which methylation studies could lead to the discovery of new regulatory mechanisms.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

11

Unknown

ellipsoidFN: a tool for identifying a heterogeneous set of cancer biomarkers based on gene expressions (2013)

Ren, X., Wang, Y., Chen, L., Zhang, X.-S., Jin, Q.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-20

Description: Computationally identifying effective biomarkers for cancers from gene expression profiles is an important and challenging task. The challenge lies in the complicated pathogenesis of cancers that often involve the dysfunction of many genes and regulatory interactions. Thus, sophisticated classification model is in pressing need. In this study, we proposed an efficient approach, called ellipsoidFN (ellipsoid Feature Net), to model the disease complexity by ellipsoids and seek a set of heterogeneous biomarkers. Our approach achieves a non-linear classification scheme for the mixed samples by the ellipsoid concept, and at the same time uses a linear programming framework to efficiently select biomarkers from high-dimensional space. ellipsoidFN reduces the redundancy and improves the complementariness between the identified biomarkers, thus significantly enhancing the distinctiveness between cancers and normal samples, and even between cancer types. Numerical evaluation on real prostate cancer, breast cancer and leukemia gene expression datasets suggested that ellipsoidFN outperforms the state-of-the-art biomarker identification methods, and it can serve as a useful tool for cancer biomarker identification in the future. The Matlab code of ellipsoidFN is freely available from http://doc.aporc.org/wiki/EllipsoidFN .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

12

Unknown

Volatility in mRNA secondary structure as a design principle for antisense (2013)

Johnson, E., Srivastava, R.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: Designing effective antisense sequences is a formidable problem. A method for predicting efficacious antisense holds the potential to provide fundamental insight into this biophysical process. More practically, such an understanding increases the chance of successful antisense design as well as saving considerable time, money and labor. The secondary structure of an mRNA molecule is believed to be in a constant state of flux, sampling several different suboptimal states. We hypothesized that particularly volatile regions might provide better accessibility for antisense targeting. A computational framework, GenAVERT was developed to evaluate this hypothesis. GenAVERT used UNAFold and RNAforester to generate and compare the predicted suboptimal structures of mRNA sequences. Subsequent analysis revealed regions that were particularly volatile in terms of intramolecular hydrogen bonding, and thus potentially superior antisense targets due to their high accessibility. Several mRNA sequences with known natural antisense target sites as well as artificial antisense target sites were evaluated. Upon comparison, antisense sequences predicted based upon the volatility hypothesis closely matched those of the naturally occurring antisense, as well as those artificial target sites that provided efficient down-regulation. These results suggest that this strategy may provide a powerful new approach to antisense design.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

13

Unknown

EMdeCODE: a novel algorithm capable of reading words of epigenetic code to predict enhancers and retroviral integration sites and to identify H3R2me1 as a distinctive mark of coding versus non-coding genes (2013)

Santoni, F. A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: Existence of some extra-genetic (epigenetic) codes has been postulated since the discovery of the primary genetic code. Evident effects of histone post-translational modifications or DNA methylation over the efficiency and the regulation of DNA processes are supporting this postulation. EMdeCODE is an original algorithm that approximate the genomic distribution of given DNA features (e.g. promoter, enhancer, viral integration) by identifying relevant ChIPSeq profiles of post-translational histone marks or DNA binding proteins and combining them in a supermark. EMdeCODE kernel is essentially a two-step procedure: (i) an expectation-maximization process calculates the mixture of epigenetic factors that maximize the Sensitivity (recall) of the association with the feature under study; (ii) the approximated density is then recursively trimmed with respect to a control dataset to increase the precision by reducing the number of false positives. EMdeCODE densities improve significantly the prediction of enhancer loci and retroviral integration sites with respect to previous methods. Importantly, it can also be used to extract distinctive factors between two arbitrary conditions. Indeed EMdeCODE identifies unexpected epigenetic profiles specific for coding versus non-coding RNA, pointing towards a new role for H3R2me1 in coding regions.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

14

Unknown

A population model for genotyping indels from next-generation sequence data (2013)

Shao, H., Bellos, E., Yin, H., Liu, X., Zou, J., Li, Y., Wang, J., Coin, L. J. M.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: Insertion and deletion polymorphisms (indels) are an important source of genomic variation in plant and animal genomes, but accurate genotyping from low-coverage and exome next-generation sequence data remains challenging. We introduce an efficient population clustering algorithm for diploids and polyploids which was tested on a dataset of 2000 exomes. Compared with existing methods, we report a 4-fold reduction in overall indel genotype error rates with a 9-fold reduction in low coverage regions.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

15

Unknown

PreCisIon: PREdiction of CIS-regulatory elements improved by gene's positION (2013)

Elati, M., Nicolle, R., Junier, I., Fernandez, D., Fekih, R., Font, J., Kepes, F.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: Conventional approaches to predict transcriptional regulatory interactions usually rely on the definition of a shared motif sequence on the target genes of a transcription factor (TF). These efforts have been frustrated by the limited availability and accuracy of TF binding site motifs, usually represented as position-specific scoring matrices, which may match large numbers of sites and produce an unreliable list of target genes. To improve the prediction of binding sites, we propose to additionally use the unrelated knowledge of the genome layout. Indeed, it has been shown that co-regulated genes tend to be either neighbors or periodically spaced along the whole chromosome. This study demonstrates that respective gene positioning carries significant information. This novel type of information is combined with traditional sequence information by a machine learning algorithm called PreCisIon. To optimize this combination, PreCisIon builds a strong gene target classifier by adaptively combining weak classifiers based on either local binding sequence or global gene position. This strategy generically paves the way to the optimized incorporation of any future advances in gene target prediction based on local sequence, genome layout or on novel criteria. With the current state of the art, PreCisIon consistently improves methods based on sequence information only. This is shown by implementing a cross-validation analysis of the 20 major TFs from two phylogenetically remote model organisms. For Bacillus subtilis and Escherichia coli , respectively, PreCisIon achieves on average an area under the receiver operating characteristic curve of 70 and 60%, a sensitivity of 80 and 70% and a specificity of 60 and 56%. The newly predicted gene targets are demonstrated to be functionally consistent with previously known targets, as assessed by analysis of Gene Ontology enrichment or of the relevant literature and databases.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

16

Unknown

miRNA target enrichment analysis reveals directly active miRNAs in health and disease (2013)

Steinfeld, I., Navon, R., Ach, R., Yakhini, Z.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: microRNAs (miRNAs) are short non-coding regulatory RNA molecules. The activity of a miRNA in a biological process can often be reflected in the expression program that characterizes the outcome of the activity. We introduce a computational approach that infers such activity from high-throughput data using a novel statistical methodology, called minimum-mHG (mmHG), that examines mutual enrichment in two ranked lists. Based on this methodology, we provide a user-friendly web application that supports the statistical assessment of miRNA target enrichment analysis (miTEA) in the top of a ranked list of genes or proteins. Using miTEA, we analyze several target prediction tools by examining performance on public miRNA constitutive expression data. We also apply miTEA to analyze several integrative biology data sets, including a novel matched miRNA/mRNA data set covering nine human tissue types. Our novel findings include proposed direct activity of miR-519 in placenta, a direct activity of the oncogenic miR-15 in different healthy tissue types and a direct activity of the poorly characterized miR-768 in both healthy tissue types and cancer cell lines. The miTEA web application is available at http://cbl-gorilla.cs.technion.ac.il/miTEA/ .

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

17

Unknown

The twilight zone of cis element alignments (2013)

Sebastian, A., Contreras-Moreira, B.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare ( http://floresta.eead.csic.es/tfcompare ), a structural alignment method for protein–DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein–DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

18

Unknown

TEAK: Topology Enrichment Analysis frameworK for detecting activated biological subpathways (2013)

Judeh, T., Johnson, C., Kumar, A., Zhu, D.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: To mine gene expression data sets effectively, analysis frameworks need to incorporate methods that identify intergenic relationships within enriched biologically relevant subpathways. For this purpose, we developed the Topology Enrichment Analysis frameworK (TEAK). TEAK employs a novel in-house algorithm and a tailor-made Clique Percolation Method to extract linear and nonlinear KEGG subpathways, respectively. TEAK scores subpathways using the Bayesian Information Criterion for context specific data and the Kullback-Leibler divergence for case–control data. In this article, we utilized TEAK with experimental studies to analyze microarray data sets profiling stress responses in the model eukaryote Saccharomyces cerevisiae . Using a public microarray data set, we identified via TEAK linear sphingolipid metabolic subpathways activated during the yeast response to nitrogen stress, and phenotypic analyses of the corresponding deletion strains indicated previously unreported fitness defects for the dpl1 and lag1 mutants under conditions of nitrogen limitation. In addition, we studied the yeast filamentous response to nitrogen stress by profiling changes in transcript levels upon deletion of two key filamentous growth transcription factors, FLO8 and MSS11 . Via TEAK we identified a nonlinear glycerophospholipid metabolism subpathway involving the SLC1 gene, which we found via mutational analysis to be required for yeast filamentous growth.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

19

Unknown

Improving detection of copy-number variation by simultaneous bias correction and read-depth segmentation (2013)

Szatkiewicz, J. P., Wang, W., Sullivan, P. F., Wang, W., Sun, W.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-02-02

Description: Structural variation is an important class of genetic variation in mammals. High-throughput sequencing (HTS) technologies promise to revolutionize copy-number variation (CNV) detection but present substantial analytic challenges. Converging evidence suggests that multiple types of CNV-informative data (e.g. read-depth, read-pair, split-read) need be considered, and that sophisticated methods are needed for more accurate CNV detection. We observed that various sources of experimental biases in HTS confound read-depth estimation, and note that bias correction has not been adequately addressed by existing methods. We present a novel read-depth–based method, GENSENG, which uses a hidden Markov model and negative binomial regression framework to identify regions of discrete copy-number changes while simultaneously accounting for the effects of multiple confounders. Based on extensive calibration using multiple HTS data sets, we conclude that our method outperforms existing read-depth–based CNV detection algorithms. The concept of simultaneous bias correction and CNV detection can serve as a basis for combining read-depth with other types of information such as read-pair or split-read in a single analysis. A user-friendly and computationally efficient implementation of our method is freely available.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

20

Unknown

A scale-space method for detecting recurrent DNA copy number changes with analytical false discovery rate control (2013)

van Dyk, E., Reinders, M. J. T., Wessels, L. F. A.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-05-04

Description: Tumor formation is partially driven by DNA copy number changes, which are typically measured using array comparative genomic hybridization, SNP arrays and DNA sequencing platforms. Many techniques are available for detecting recurring aberrations across multiple tumor samples, including CMAR, STAC, GISTIC and KC-SMART. GISTIC is widely used and detects both broad and focal (potentially overlapping) recurring events. However, GISTIC performs false discovery rate control on probes instead of events. Here we propose Analytical Multi-scale Identification of Recurrent Events, a multi-scale Gaussian smoothing approach, for the detection of both broad and focal (potentially overlapping) recurring copy number alterations. Importantly, false discovery rate control is performed analytically (no need for permutations) on events rather than probes. The method does not require segmentation or calling on the input dataset and therefore reduces the potential loss of information due to discretization. An important characteristic of the approach is that the error rate is controlled across all scales and that the algorithm outputs a single profile of significant events selected from the appropriate scales. We perform extensive simulations and showcase its utility on a glioblastoma SNP array dataset. Importantly, ADMIRE detects focal events that are missed by GISTIC, including two events involving known glioma tumor-suppressor genes: CDKN2C and NF1.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

21

Unknown

Ensuring the statistical soundness of competitive gene set approaches: gene filtering and genome-scale coverage are essential (2013)

Tripathi, S., Glazko, G. V., Emmert-Streib, F.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-04-14

Description: In this article, we focus on the analysis of competitive gene set methods for detecting the statistical significance of pathways from gene expression data. Our main result is to demonstrate that some of the most frequently used gene set methods, GSEA, GSEArot and GAGE, are severely influenced by the filtering of the data in a way that such an analysis is no longer reconcilable with the principles of statistical inference, rendering the obtained results in the worst case inexpressive. A possible consequence of this is that these methods can increase their power by the addition of unrelated data and noise. Our results are obtained within a bootstrapping framework that allows a rigorous assessment of the robustness of results and enables power estimates. Our results indicate that when using competitive gene set methods, it is imperative to apply a stringent gene filtering criterion. However, even when genes are filtered appropriately, for gene expression data from chips that do not provide a genome-scale coverage of the expression values of all mRNAs, this is not enough for GSEA, GSEArot and GAGE to ensure the statistical soundness of the applied procedure. For this reason, for biomedical and clinical studies, we strongly advice not to use GSEA, GSEArot and GAGE for such data sets.

Keywords: Computational Methods, Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext