ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (76)
  • Computational Methods, Massively Parallel (Deep) Sequencing, Genomics  (31)
  • Computational Methods  (26)
  • Gravity, Geodesy and Tides  (19)
  • Oxford University Press  (76)
  • 2010-2014  (76)
  • 1980-1984
  • 1950-1954
Collection
  • Articles  (76)
Publisher
Years
  • 2010-2014  (76)
  • 1980-1984
  • 1950-1954
  • 2015-2019  (118)
Year
Topic
  • 1
    Publication Date: 2013-09-26
    Description: Revealing the clonal composition of a single tumor is essential for identifying cell subpopulations with metastatic potential in primary tumors or with resistance to therapies in metastatic tumors. Sequencing technologies provide only an overview of the aggregate of numerous cells. Computational approaches to de-mix a collective signal composed of the aberrations of a mixed cell population of a tumor sample into its individual components are not available. We propose an evolutionary framework for deconvolving data from a single genome-wide experiment to infer the composition, abundance and evolutionary paths of the underlying cell subpopulations of a tumor. We have developed an algorithm (TrAp) for solving this mixture problem. In silico analyses show that TrAp correctly deconvolves mixed subpopulations when the number of subpopulations and the measurement errors are moderate. We demonstrate the applicability of the method using tumor karyotypes and somatic hypermutation data sets. We applied TrAp to Exome-Seq experiment of a renal cell carcinoma tumor sample and compared the mutational profile of the inferred subpopulations to the mutational profiles of single cells of the same tumor. Finally, we deconvolve sequencing data from eight acute myeloid leukemia patients and three distinct metastases of one melanoma patient to exhibit the evolutionary relationships of their subpopulations.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2013-04-02
    Description: MicroRNAs (miRNAs) constitute an important class of small regulatory RNAs that are derived from distinct hairpin precursors (pre-miRNAs). In contrast to mature miRNAs, which have been characterized in numerous genome-wide studies of different organisms, research on global profiling of pre-miRNAs is limited. Here, using massive parallel sequencing, we have performed global characterization of both mouse mature and precursor miRNAs. In total, 87 369 704 and 252 003 sequencing reads derived from 887 mature and 281 precursor miRNAs were obtained, respectively. Our analysis revealed new aspects of miRNA/pre-miRNA processing and modification, including eight Ago2-cleaved pre-miRNAs, eight new instances of miRNA editing and exclusively 5' tailed mirtrons. Furthermore, based on the sequences of both mature and precursor miRNAs, we developed a miRNA discovery pipeline, miRGrep, which does not rely on the availability of genome reference sequences. In addition to 239 known mouse pre-miRNAs, miRGrep predicted 41 novel ones with high confidence. Similar as known ones, the mature miRNAs derived from most of these novel loci showed both reduced abundance following Dicer knockdown and the binding with Argonaute2. Evaluation on data sets obtained from Caenorhabditis elegans and Caenorhabditis sp.11 demonstrated that miRGrep could be widely used for miRNA discovery in metazoans, especially in those without genome reference sequences.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2013-09-26
    Description: It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense–antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at http://www.bioinfo.org/software/cnci .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2014-12-17
    Description: Measurements of ground deformation can be used to identify and interpret geophysical processes occurring at volcanoes. Most studies rely on a single geodetic technique, or fit a geophysical model to the results of multiple geodetic techniques. Here we present a methodology that combines GPS, Total Station measurements and InSAR into a single reference frame to produce an integrated 3-D geodetic velocity surface without any prior geophysical assumptions. The methodology consists of five steps: design of the network, acquisition and processing of the data, spatial integration of the measurements, time series computation and finally the integration of spatial and temporal measurements. The most significant improvements of this method are (1) the reduction of the required field time, (2) the unambiguous detection of outliers, (3) an increased measurement accuracy and (4) the construction of a 3-D geodetic velocity field. We apply this methodology to ongoing motion on Arenal's western flank. Integration of multiple measurement techniques at Arenal volcano revealed a deformation field that is more complex than that described by individual geodetic techniques, yet remains consistent with previous studies. This approach can be applied to volcano monitoring worldwide and has the potential to be extended to incorporate other geodetic techniques and to study transient deformation.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2014-12-17
    Description: Combinatorial transcription factor (TF) binding is essential for cell-type-specific gene regulation. However, much remains to be learned about the mechanisms of TF interactions, including to what extent constrained spacing and orientation of interacting TFs are critical for regulatory element activity. To examine the relative prevalence of the ‘enhanceosome’ versus the ‘TF collective’ model of combinatorial TF binding, a comprehensive analysis of TF binding site sequences in large scale datasets is necessary. We developed a motif-pair discovery pipeline to identify motif co-occurrences with preferential distance(s) between motifs in TF-bound regions. Utilizing a compendium of 289 mouse haematopoietic TF ChIP-seq datasets, we demonstrate that haematopoietic-related motif-pairs commonly occur with highly conserved constrained spacing and orientation between motifs. Furthermore, motif clustering revealed specific associations for both heterotypic and homotypic motif-pairs with particular haematopoietic cell types. We also showed that disrupting the spacing between motif-pairs significantly affects transcriptional activity in a well-known motif-pair—E-box and GATA, and in two previously unknown motif-pairs with constrained spacing—Ets and Homeobox as well as Ets and E-box. In this study, we provide evidence for widespread sequence-specific TF pair interaction with DNA that conforms to the ‘enhanceosome’ model, and furthermore identify associations between specific haematopoietic cell-types and motif-pairs.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2014-11-09
    Description: In autumn 2012, the new release 05 (RL05) of monthly geopotencial spherical harmonics Stokes coefficients (SC) from Gravity Recovery and Climate Experiment (GRACE) mission was published. This release reduces the noise in high degree and order SC, but they still need to be filtered. One of the most common filtering processing is the combination of decorrelation and Gaussian filters. Both of them are parameters dependent and must be tuned by the users. Previous studies have analyzed the parameters choice for the RL05 GRACE data for oceanic applications, and for RL04 data for global application. This study updates the latter for RL05 data extending the statistics analysis. The choice of the parameters of the decorrelation filter has been optimized to: (1) balance the noise reduction and the geophysical signal attenuation produced by the filtering process; (2) minimize the differences between GRACE and model-based data and (3) maximize the ratio of variability between continents and oceans. The Gaussian filter has been optimized following the latter criteria. Besides, an anisotropic filter, the fan filter, has been analyzed as an alternative to the Gauss filter, producing better statistics.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2014-12-17
    Description: Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2012-10-10
    Description: Several bioinformatics methods have been proposed for the detection and characterization of genomic structural variation (SV) from ultra high-throughput genome resequencing data. Recent surveys show that comprehensive detection of SV events of different types between an individual resequenced genome and a reference sequence is best achieved through the combination of methods based on different principles (split mapping, reassembly, read depth, insert size, etc.). The improvement of individual predictors is thus an important objective. In this study, we propose a new method that combines deviations from expected library insert sizes and additional information from local patterns of read mapping and uses supervised learning to predict the position and nature of structural variants. We show that our approach provides greatly increased sensitivity with respect to other tools based on paired end read mapping at no cost in specificity, and it makes reliable predictions of very short insertions and deletions in repetitive and low-complexity genomic contexts that can confound tools based on split mapping of reads.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2012-04-15
    Description: Exome sequencing strategy is promising for finding novel mutations of human monogenic disorders. However, pinpointing the casual mutation in a small number of samples is still a big challenge. Here, we propose a three-level filtration and prioritization framework to identify the casual mutation(s) in exome sequencing studies. This efficient and comprehensive framework successfully narrowed down whole exome variants to very small numbers of candidate variants in the proof-of-concept examples. The proposed framework, implemented in a user-friendly software package, named KGGSeq ( http://statgenpro.psychiatry.hku.hk/kggseq ), will play a very useful role in exome sequencing-based discovery of human Mendelian disease genes.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2012-07-22
    Description: Sequence elements, at all levels—DNA, RNA and protein, play a central role in mediating molecular recognition and thereby molecular regulation and signaling. Studies that focus on measuring and investigating sequence-based recognition make use of statistical and computational tools, including approaches to searching sequence motifs. State-of-the-art motif searching tools are limited in their coverage and ability to address large motif spaces. We develop and present statistical and algorithmic approaches that take as input ranked lists of sequences and return significant motifs. The efficiency of our approach, based on suffix trees, allows searches over motif spaces that are not covered by existing tools. This includes searching variable gap motifs—two half sites with a flexible length gap in between—and searching long motifs over large alphabets. We used our approach to analyze several high-throughput measurement data sets and report some validation results as well as novel suggested motifs and motif refinements. We suggest a refinement of the known estrogen receptor 1 motif in humans, where we observe gaps other than three nucleotides that also serve as significant recognition sites, as well as a variable length motif related to potential tyrosine phosphorylation.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2012-07-22
    Description: Cataloging the association of transcripts to genetic variants in recent years holds the promise for functional dissection of regulatory structure of human transcription. Here, we present a novel approach, which aims at elucidating the joint relationships between transcripts and single-nucleotide polymorphisms (SNPs). This entails detection and analysis of modules of transcripts, each weakly associated to a single genetic variant, together exposing a high-confidence association signal between the module and this ‘main’ SNP. To explore how transcripts in a module are related to causative loci for that module, we represent such dependencies by a graphical model. We applied our method to the existing data on genetics of gene expression in the liver. The modules are significantly more, larger and denser than found in permuted data. Quantification of the confidence in a module as a likelihood score, allows us to detect transcripts that do not reach genome-wide significance level. Topological analysis of each module identifies novel insights regarding the flow of causality between the main SNP and transcripts. We observe similar annotations of modules from two sources of information: the enrichment of a module in gene subsets and locus annotation of the genetic variants. This and further phenotypic analysis provide a validation for our methodology.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2012-07-22
    Description: Small RNAs (sRNAs) are a class of short (20–25 nt) non-coding RNAs that play important regulatory roles in gene expression. An essential first step in understanding their function is to confidently identify sRNA targets. In plants, several classes of sRNAs such as microRNAs (miRNAs) and trans-acting small interfering RNAs have been shown to bind with near-perfect complementarity to their messenger RNA (mRNA) targets, generally leading to cleavage of the mRNA. Recently, a high-throughput technique known as Parallel Analysis of RNA Ends (PARE) has made it possible to sequence mRNA cleavage products on a large-scale. Computational methods now exist to use these data to find targets of conserved and newly identified miRNAs. Due to speed limitations such methods rely on the user knowing which sRNA sequences are likely to target a transcript. By limiting the search to a tiny subset of sRNAs it is likely that many other sRNA/mRNA interactions will be missed. Here, we describe a new software tool called PAREsnip that allows users to search for potential targets of all sRNAs obtained from high-throughput sequencing experiments. By searching for targets of a complete ‘sRNAome’ we can facilitate large-scale identification of sRNA targets, allowing us to discover regulatory interaction networks.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2012-07-22
    Description: Phase variation of surface structures occurs in diverse bacterial species due to stochastic, high frequency, reversible mutations. Multiple genes of Campylobacter jejuni are subject to phase variable gene expression due to mutations in polyC/G tracts. A modal length of nine repeats was detected for polyC/G tracts within C. jejuni genomes. Switching rates for these tracts were measured using chromosomally-located reporter constructs and high rates were observed for cj1139 (G8) and cj0031 (G9). Alteration of the cj1139 tract from G8 to G11 increased mutability 10-fold and changed the mutational pattern from predominantly insertions to mainly deletions. Using a multiplex PCR, major changes were detected in ‘on/off’ status for some phase variable genes during passage of C. jejuni in chickens. Utilization of observed switching rates in a stochastic, theoretical model of phase variation demonstrated links between mutability and genetic diversity but could not replicate observed population diversity. We propose that modal repeat numbers have evolved in C. jejuni genomes due to molecular drivers associated with the mutational patterns of these polyC/G repeats, rather than by selection for particular switching rates, and that factors other than mutational drift are responsible for generating genetic diversity during host colonization by this bacterial pathogen.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2012-09-13
    Description: Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2012-09-13
    Description: Control of translation in eukaryotes is complex, depending on the binding of various factors to mRNAs. Available data for subsets of mRNAs that are translationally up- and down-regulated in yeast eIF4E-binding protein (4E-BP) deletion mutants are coupled with reported mRNA secondary structure measurements to investigate whether 5'-UTR secondary structure varies between the subsets. Genes with up-regulated translational efficiencies in the caf20 mutant have relatively high averaged 5'-UTR secondary structure. There is no apparent wide-scale correlation of RNA-binding protein preferences with the increased 5'-UTR secondary structure, leading us to speculate that the secondary structure itself may play a role in differential partitioning of mRNAs between eIF4E/4E-BP repression and eIF4E/eIF4G translation initiation. Both Caf20p and Eap1p contain stretches of positive charge in regions of predicted disorder. Such regions are also present in eIF4G and have been reported to associate with mRNA binding. The pattern of these segments, around the canonical eIF4E-binding motif, varies between each 4E-BP and eIF4G. Analysis of gene ontology shows that yeast proteins containing predicted disordered segments, with positive charge runs, are enriched for nucleic acid binding. We propose that the 4E-BPs act, in part, as differential, flexible, polyelectrostatic scaffolds for mRNAs.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2012-09-13
    Description: The use of a priori knowledge in the alignment of targeted sequencing data is investigated using computational experiments. Adapting a Needleman–Wunsch algorithm to incorporate the genomic position information from the targeted capture, we demonstrate that alignment can be done to just the target region of interest. When in addition use is made of direct string comparison, an improvement of up to a factor of 8 in alignment speed compared to the fastest conventional aligner (Bowtie) is obtained. This results in a total alignment time in targeted sequencing of around 7 min for aligning approximately 56 million captured reads. For conventional aligners such as Bowtie, BWA or MAQ, alignment to just the target region is not feasible as experiments show that this leads to an additional 88% SNP calls, the vast majority of which are false positives (~92%).
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2012-06-28
    Description: We introduce Grinder ( http://sourceforge.net/projects/biogrinder/ ), an open-source bioinformatic tool to simulate amplicon and shotgun (genomic, metagenomic, transcriptomic and metatranscriptomic) datasets from reference sequences. This is the first tool to simulate amplicon datasets (e.g. 16S rRNA) widely used by microbial ecologists. Grinder can create sequence libraries with a specific community structure, α and β diversities and experimental biases (e.g. chimeras, gene copy number variation) for commonly used sequencing platforms. This versatility allows the creation of simple to complex read datasets necessary for hypothesis testing when developing bioinformatic software, benchmarking existing tools or designing sequence-based experiments. Grinder is particularly useful for simulating clinical or environmental microbial communities and complements the use of in vitro mock communities.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2012-06-06
    Description: The most crucial step in data processing from high-throughput sequencing applications is the accurate and sensitive alignment of the sequencing reads to reference genomes or transcriptomes. The accurate detection of insertions and deletions (indels) and errors introduced by the sequencing platform or by misreading of modified nucleotides is essential for the quantitative processing of the RNA-based sequencing (RNA-Seq) datasets and for the identification of genetic variations and modification patterns. We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map millions of reads to small or large reference genomes. It is a seed-based algorithm which uses the whole read information for mapping and high sensitivity and low ambiguity are achieved by using short and non-overlapping reads. Furthermore, FANSe uses hotspot score to prioritize the processing of highly possible matches and implements modified Smith–Watermann refinement with reduced scoring matrix to accelerate the calculation without compromising its sensitivity. The FANSe algorithm stably processes datasets from various sequencing platforms, masked or unmasked and small or large genomes. It shows a remarkable coverage of low-abundance mRNAs which is important for quantitative processing of RNA-Seq datasets.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2012-04-24
    Description: Ultra-deep RNA sequencing has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We develop MATS (multivariate analysis of transcript splicing), a Bayesian statistical framework for flexible hypothesis testing of differential alternative splicing patterns on RNA-Seq data. MATS uses a multivariate uniform prior to model the between-sample correlation in exon splicing patterns, and a Markov chain Monte Carlo (MCMC) method coupled with a simulation-based adaptive sampling procedure to calculate the P -value and false discovery rate (FDR) of differential alternative splicing. Importantly, the MATS approach is applicable to almost any type of null hypotheses of interest, providing the flexibility to identify differential alternative splicing events that match a given user-defined pattern. We evaluated the performance of MATS using simulated and real RNA-Seq data sets. In the RNA-Seq analysis of alternative splicing events regulated by the epithelial-specific splicing factor ESRP1, we obtained a high RT–PCR validation rate of 86% for differential exon skipping events with a MATS FDR of 〈10%. Additionally, over the full list of RT–PCR tested exons, the MATS FDR estimates matched well with the experimental validation rate. Our results demonstrate that MATS is an effective and flexible approach for detecting differential alternative splicing from RNA-Seq data.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2012-05-13
    Description: Numerous algorithms have been developed to analyze ChIP-Seq data. However, the complexity of analyzing diverse patterns of ChIP-Seq signals, especially for epigenetic marks, still calls for the development of new algorithms and objective comparisons of existing methods. We developed Qeseq, an algorithm to detect regions of increased ChIP read density relative to background. Qeseq employs critical novel elements, such as iterative recalibration and neighbor joining of reads to identify enriched regions of any length. To objectively assess its performance relative to other 14 ChIP-Seq peak finders, we designed a novel protocol based on Validation Discriminant Analysis (VDA) to optimally select validation sites and generated two validation datasets, which are the most comprehensive to date for algorithmic benchmarking of key epigenetic marks. In addition, we systematically explored a total of 315 diverse parameter configurations from these algorithms and found that typically optimal parameters in one dataset do not generalize to other datasets. Nevertheless, default parameters show the most stable performance, suggesting that they should be used. This study also provides a reproducible and generalizable methodology for unbiased comparative analysis of high-throughput sequencing tools that can facilitate future algorithmic development.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2012-05-13
    Description: The informational content of RNA sequencing is currently far from being completely explored. Most of the analyses focus on processing tables of counts or finding isoform deconvolution via exon junctions. This article presents a comparison of several techniques that can be used to estimate differential expression of exons or small genomic regions of expression, based on their coverage function shapes. The problem is defined as finding the differentially expressed exons between two samples using local expression profile normalization and statistical measures to spot the differences between two profile shapes. Initial experiments have been done using synthetic data, and real data modified with synthetically created differential patterns. Then, 160 pipelines (5 types of generator x 4 normalizations x 8 difference measures) are compared. As a result, the best analysis pipelines are selected based on linearity of the differential expression estimation and the area under the ROC curve. These platform-independent techniques have been implemented in the Bioconductor package rnaSeqMap. They point out the exons with differential expression or internal splicing, even if the counts of reads may not show this. The areas of application include significant difference searches, splicing identification algorithms and finding suitable regions for QPCR primers.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2012-05-13
    Description: The rapid expansion in the quantity and quality of RNA-Seq data requires the development of sophisticated high-performance bioinformatics tools capable of rapidly transforming this data into meaningful information that is easily interpretable by biologists. Currently available analysis tools are often not easily installed by the general biologist and most of them lack inherent parallel processing capabilities widely recognized as an essential feature of next-generation bioinformatics tools. We present here a user-friendly and fully automated R NA- S eq a nalysis p ipeline (R-SAP) with built-in multi-threading capability to analyze and quantitate high-throughput RNA-Seq datasets. R-SAP follows a hierarchical decision making procedure to accurately characterize various classes of transcripts and achieves a near linear decrease in data processing time as a result of increased multi-threading. In addition, RNA expression level estimates obtained using R-SAP display high concordance with levels measured by microarrays.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2012-05-23
    Description: The activation of cryptic 5' splice sites (5' SSs) is often related to human hereditary diseases. The DNA-based mutation screening strategies are commonly used to recognize the cryptic 5' SSs, because features of the local DNA sequence can influence the choice of cryptic 5' SSs. To improve the identification of the cryptic 5' SSs, we developed a structure-based method, named SPO (structure profiles and odds measure), which combines two parameters, the structural feature derived from hydroxyl radical cleavage pattern and odds measure, to assess the likelihood of a cryptic 5' SS activation in competing with its paired authentic 5' SS. Compared to the current tools for identifying activated cryptic 5' SSs, the SPO algorithm achieves higher prediction accuracy than the other methods, including MaxEnt, MDD, Markov model, weight matrix model, Shapiro and Senapathy matrix, R i and G . In addition, the predicted SPO scores from the SPO algorithm exhibited a greater degree of correlation with the strength of cryptic 5' SS activation than that measured from the other seven methods. In conclusion, the SPO algorithm provides an optimal identification of cryptic 5' SSs, can be applied in designing mutagenesis experiments for various splicing events and may be helpful to investigate the relationship between structural variants and human hereditary diseases.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2012-05-23
    Description: Deciphering the structure of gene regulatory networks across the tree of life remains one of the major challenges in postgenomic biology. We present a novel ChIP-seq workflow for the archaea using the model organism Halobacterium salinarum sp. NRC-1 and demonstrate its application for mapping the genome-wide binding sites of natively expressed transcription factors. This end-to-end pipeline is the first protocol for ChIP-seq in archaea, with methods and tools for each stage from gene tagging to data analysis and biological discovery. Genome-wide binding sites for transcription factors with many binding sites (TfbD) are identified with sensitivity, while retaining specificity in the identification the smaller regulons (bacteriorhodopsin-activator protein). Chromosomal tagging of target proteins with a compact epitope facilitates a standardized and cost-effective workflow that is compatible with high-throughput immunoprecipitation of natively expressed transcription factors. The Pique package, an open-source bioinformatics method, is presented for identification of binding events. Relative to ChIP-Chip and qPCR, this workflow offers a robust catalog of protein–DNA binding events with improved spatial resolution and significantly decreased cost. While this study focuses on the application of ChIP-seq in H. salinarum sp. NRC-1, our workflow can also be adapted for use in other archaea and bacteria with basic genetic tools.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2012-05-23
    Description: A flexible statistical framework is developed for the analysis of read counts from RNA-Seq gene expression studies. It provides the ability to analyse complex experiments involving multiple treatment conditions and blocking variables while still taking full account of biological variation. Biological variation between RNA samples is estimated separately from the technical variation associated with sequencing technologies. Novel empirical Bayes methods allow each gene to have its own specific variability, even when there are relatively few biological replicates from which to estimate such variability. The pipeline is implemented in the edgeR package of the Bioconductor project. A case study analysis of carcinoma data demonstrates the ability of generalized linear model methods (GLMs) to detect differential expression in a paired design, and even to detect tumour-specific expression changes. The case study demonstrates the need to allow for gene-specific variability, rather than assuming a common dispersion across genes or a fixed relationship between abundance and variability. Genewise dispersions de-prioritize genes with inconsistent results and allow the main analysis to focus on changes that are consistent between biological replicates. Parallel computational approaches are developed to make non-linear model fitting faster and more reliable, making the application of GLMs to genomic data more convenient and practical. Simulations demonstrate the ability of adjusted profile likelihood estimators to return accurate estimators of biological variability in complex situations. When variation is gene-specific, empirical Bayes estimators provide an advantageous compromise between the extremes of assuming common dispersion or separate genewise dispersion. The methods developed here can also be applied to count data arising from DNA-Seq applications, including ChIP-Seq for epigenetic marks and DNA methylation analyses.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2012-02-28
    Description: ChIP-seq is increasingly used to characterize transcription factor binding and chromatin marks at a genomic scale. Various tools are now available to extract binding motifs from peak data sets. However, most approaches are only available as command-line programs, or via a website but with size restrictions. We present peak-motifs , a computational pipeline that discovers motifs in peak sequences, compares them with databases, exports putative binding sites for visualization in the UCSC genome browser and generates an extensive report suited for both naive and expert users. It relies on time- and memory-efficient algorithms enabling the treatment of several thousand peaks within minutes. Regarding time efficiency, peak-motifs outperforms all comparable tools by several orders of magnitude. We demonstrate its accuracy by analyzing data sets ranging from 4000 to 1 28 000 peaks for 12 embryonic stem cell-specific transcription factors. In all cases, the program finds the expected motifs and returns additional motifs potentially bound by cofactors. We further apply peak-motifs to discover tissue-specific motifs in peak collections for the p300 transcriptional co-activator. To our knowledge, peak-motifs is the only tool that performs a complete motif analysis and offers a user-friendly web interface without any restriction on sequence size or number of peaks.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2014-03-13
    Description: Genetic disorders can be detected by prenatal diagnosis using Chorionic Villus Sampling, but the 1:100 chance to result in miscarriage restricts the use to fetuses that are suspected to have an aberration. Detection of trisomy 21 cases noninvasively is now possible owing to the upswing of next-generation sequencing (NGS) because a small percentage of fetal DNA is present in maternal plasma. However, detecting other trisomies and smaller aberrations can only be realized using high-coverage NGS, making it too expensive for routine practice. We present a method, WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR), which detects small aberrations using low-coverage NGS. The increased detection resolution was achieved by comparing read counts within the tested sample of each genomic region with regions on other chromosomes that behave similarly in control samples. This within-sample comparison avoids the need to re-sequence control samples. WISECONDOR correctly identified all T13, T18 and T21 cases while coverages were as low as 0.15–1.66. No false positives were identified. Moreover, WISECONDOR also identified smaller aberrations, down to 20 Mb, such as del(13)(q12.3q14.3), +i(12)(p10) and i(18)(q10). This shows that prevalent fetal copy number aberrations can be detected accurately and affordably by shallow sequencing maternal plasma. WISECONDOR is available at bioinformatics.tudelft.nl/wisecondor.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2014-05-01
    Description: Determining the taxonomic affiliation of sequences assembled from metagenomes remains a major bottleneck that affects research across the fields of environmental, clinical and evolutionary microbiology. Here, we introduce MyTaxa, a homology-based bioinformatics framework to classify metagenomic and genomic sequences with unprecedented accuracy. The distinguishing aspect of MyTaxa is that it employs all genes present in an unknown sequence as classifiers, weighting each gene based on its (predetermined) classifying power at a given taxonomic level and frequency of horizontal gene transfer. MyTaxa also implements a novel classification scheme based on the genome-aggregate average amino acid identity concept to determine the degree of novelty of sequences representing uncharacterized taxa, i.e. whether they represent novel species, genera or phyla. Application of MyTaxa on in silico generated (mock) and real metagenomes of varied read length (100–2000 bp) revealed that it correctly classified at least 5% more sequences than any other tool. The analysis also showed that ~10% of the assembled sequences from human gut metagenomes represent novel species with no sequenced representatives, several of which were highly abundant in situ such as members of the Prevotella genus. Thus, MyTaxa can find several important applications in microbial identification and diversity studies.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2014-05-01
    Description: Alternative splicing is the main mechanism governing protein diversity. The recent developments in RNA-Seq technology have enabled the study of the global impact and regulation of this biological process. However, the lack of standardized protocols constitutes a major bottleneck in the analysis of alternative splicing. This is particularly important for the identification of exon–exon junctions, which is a critical step in any analysis workflow. Here we performed a systematic benchmarking of alignment tools to dissect the impact of design and method on the mapping, detection and quantification of splice junctions from multi-exon reads. Accordingly, we devised a novel pipeline based on TopHat2 combined with a splice junction detection algorithm, which we have named FineSplice. FineSplice allows effective elimination of spurious junction hits arising from artefactual alignments, achieving up to 99% precision in both real and simulated data sets and yielding superior F 1 scores under most tested conditions. The proposed strategy conjugates an efficient mapping solution with a semi-supervised anomaly detection scheme to filter out false positives and allows reliable estimation of expressed junctions from the alignment output. Ultimately this provides more accurate information to identify meaningful splicing patterns. FineSplice is freely available at https://sourceforge.net/p/finesplice/ .
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2014-02-28
    Description: Combinatorial interactions among transcription factors (TFs) are critical for integrating diverse intrinsic and extrinsic signals, fine-tuning regulatory output and increasing the robustness and plasticity of regulatory systems. Current knowledge about combinatorial regulation is rather limited due to the lack of suitable experimental technologies and bioinformatics tools. The rapid accumulation of ChIP-Seq data has provided genome-wide occupancy maps for a large number of TFs and chromatin modification marks for identifying enhancers without knowing individual TF binding sites. Integration of the two data types has not been researched extensively, resulting in underused data and missed opportunities. We describe a novel method for discovering frequent combinatorial occupancy patterns by multiple TFs at enhancers. Our method is based on probabilistic item set mining and takes into account uncertainty in both types of ChIP-Seq data. By joint analysis of 108 TFs in four human cell types, we found that cell–type-specific interactions among TFs are abundant and that the majority of enhancers have flexible architecture. We show that several families of transposable elements disproportionally overlap with enhancers with combinatorial patterns, suggesting that these transposable element families play an important role in the evolution of combinatorial regulation.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2014-02-11
    Description: Increasing numbers of protein structures are solved each year, but many of these structures belong to proteins whose sequences are homologous to sequences in the Protein Data Bank. Nevertheless, the structures of homologous proteins belonging to the same family contain useful information because functionally important residues are expected to preserve physico-chemical, structural and energetic features. This information forms the basis of our method, which detects RNA-binding residues of a given RNA-binding protein as those residues that preserve physico-chemical, structural and energetic features in its homologs. Tests on 81 RNA-bound and 35 RNA-free protein structures showed that our method yields a higher fraction of true RNA-binding residues (higher precision) than two structure-based and two sequence-based machine-learning methods. Because the method requires no training data set and has no parameters, its precision does not degrade when applied to ‘novel’ protein sequences unlike methods that are parameterized for a given training data set. It was used to predict the ‘unknown’ RNA-binding residues in the C-terminal RNA-binding domain of human CPEB3. The two predicted residues, F430 and F474, were experimentally verified to bind RNA, in particular F430, whose mutation to alanine or asparagine nearly abolished RNA binding. The method has been implemented in a webserver called DR_bind1, which is freely available with no login requirement at http://drbind.limlab.ibms.sinica.edu.tw .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2014-04-03
    Description: Recent advances in high-throughput sequencing (HTS) technologies and computing capacity have produced unprecedented amounts of genomic data that have unraveled the genetics of phenotypic variability in several species. However, operating and integrating current software tools for data analysis still require important investments in highly skilled personnel. Developing accurate, efficient and user-friendly software packages for HTS data analysis will lead to a more rapid discovery of genomic elements relevant to medical, agricultural and industrial applications. We therefore developed Next-Generation Sequencing Eclipse Plug-in (NGSEP), a new software tool for integrated, efficient and user-friendly detection of single nucleotide variants (SNVs), indels and copy number variants (CNVs). NGSEP includes modules for read alignment, sorting, merging, functional annotation of variants, filtering and quality statistics. Analysis of sequencing experiments in yeast, rice and human samples shows that NGSEP has superior accuracy and efficiency, compared with currently available packages for variants detection. We also show that only a comprehensive and accurate identification of repeat regions and CNVs allows researchers to properly separate SNVs from differences between copies of repeat elements. We expect that NGSEP will become a strong support tool to empower the analysis of sequencing data in a wide range of research projects on different species.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2012-03-29
    Description: Next-generation sequencing (NGS) technologies-based transcriptomic profiling method often called RNA-seq has been widely used to study global gene expression, alternative exon usage, new exon discovery, novel transcriptional isoforms and genomic sequence variations. However, this technique also poses many biological and informatics challenges to extracting meaningful biological information. The RNA-seq data analysis is built on the foundation of high quality initial genome localization and alignment information for RNA-seq sequences. Toward this goal, we have developed RNASEQR to accurately and effectively map millions of RNA-seq sequences. We have systematically compared RNASEQR with four of the most widely used tools using a simulated data set created from the Consensus CDS project and two experimental RNA-seq data sets generated from a human glioblastoma patient. Our results showed that RNASEQR yields more accurate estimates for gene expression, complete gene structures and new transcript isoforms, as well as more accurate detection of single nucleotide variants (SNVs). RNASEQR analyzes raw data from RNA-seq experiments effectively and outputs results in a manner that is compatible with a wide variety of specialized downstream analyses on desktop computers.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2014-10-10
    Description: Nanotechnology and synthetic biology currently constitute one of the most innovative, interdisciplinary fields of research, poised to radically transform society in the 21st century. This paper concerns the synthetic design of ribonucleic acid molecules, using our recent algorithm, RNAiFold , which can determine all RNA sequences whose minimum free energy secondary structure is a user-specified target structure. Using RNAiFold , we design ten cis -cleaving hammerhead ribozymes, all of which are shown to be functional by a cleavage assay. We additionally use RNAiFold to design a functional cis -cleaving hammerhead as a modular unit of a synthetic larger RNA. Analysis of kinetics on this small set of hammerheads suggests that cleavage rate of computationally designed ribozymes may be correlated with positional entropy, ensemble defect, structural flexibility/rigidity and related measures. Artificial ribozymes have been designed in the past either manually or by SELEX (Systematic Evolution of Ligands by Exponential Enrichment); however, this appears to be the first purely computational design and experimental validation of novel functional ribozymes. RNAiFold is available at http://bioinformatics.bc.edu/clotelab/RNAiFold/ .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2014-10-23
    Description: The paper in question by Van Camp and co-authors [MVC] challenges previous work showing that ground gravity data arising from hydrology can provide a consistent signal for the comparison with satellite gravity data. The data sets used are similar to those used previously, that is, the gravity field as measured by the GRACE satellites versus ground-based data from superconducting gravimeters (SGs) over the same continental area, in this case Central Europe. One of the main impediments in this paper is the presentation that is frequently confusing and misleading as to what the data analysis really shows, for example, the irregular treatment of annual components that are first subtracted then reappear in the analysis. More importantly, we disagree on specific points. Two calculations are included in our comment to illustrate where we believe that the processing in [MVC] paper is deficient. The first deals with their erroneous treatment of the global hydrology using a truncated spherical harmonic approach which explains almost a factor 2 error in their computation of the loading. The second shows the effect of making the wrong assumption in the GRACE/hydrology/surface gravity comparison by inverting the whole of the hydrology loading for underground stations. We also challenge their claims that empirical orthogonal function techniques cannot be done in the presence of periodic components, and that SG data cannot be corrected for comparisons with GRACE data. The main conclusion of their paper, that there is little coherence between ground gravity stations and this invalidates GRACE comparisons, is therefore questionable. There is nothing in [MVC] that contradicts any of the previous papers that have shown clearly a strong relation between seasonal signals obtained from both ground gravity and GRACE satellite data.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2014-10-23
    Description: The influence of changes in surface ice-mass redistribution and associated viscoelastic response of the Earth, known as glacial isostatic adjustment (GIA), on the Earth's rotational dynamics has long been known. Equally important is the effect of the changes in the rotational dynamics on the viscoelastic deformation of the Earth. This signal, known as the rotational feedback, or more precisely, the rotational feedback on the sea level equation, has been mathematically described by the sea level equation extended for the term that is proportional to perturbation in the centrifugal potential and the second-degree tidal Love number. The perturbation in the centrifugal force due to changes in the Earth's rotational dynamics enters not only into the sea level equation, but also into the conservation law of linear momentum such that the internal viscoelastic force, the perturbation in the gravitational force and the perturbation in the centrifugal force are in balance. Adding the centrifugal-force perturbation to the linear-momentum balance creates an additional rotational feedback on the viscoelastic deformations of the Earth. We term this feedback mechanism, which is studied in this paper, as the rotational feedback on the linear-momentum balance. We extend both the time-domain method for modelling the GIA response of laterally heterogeneous earth models developed by Martinec and the traditional Laplace-domain method for modelling the GIA-induced rotational response to surface loading by considering the rotational feedback on linear-momentum balance. The correctness of the mathematical extensions of the methods is validated numerically by comparing the polar-motion response to the GIA process and the rotationally induced degree 2 and order 1 spherical harmonic component of the surface vertical displacement and gravity field. We present the difference between the case where the rotational feedback on linear-momentum balance is considered against that where it is not. Numerical simulations show that the resulting difference in radial displacement and sea level change between these situations since the Last Glacial Maximum reaches values of ±25 and ±1.8 m, respectively. Furthermore, the surface deformation pattern is modified by up to 10 per cent in areas of former or ongoing glaciation, but by up to 50 per cent at the bottom of the southern Indian ocean. This also results in the movement of coastlines during the last deglaciation to differ between the two cases due to the difference in the ocean loading, which is seen for instance in the area around Hudson Bay, Canada and along the Chinese, Australian or Argentinian coastlines.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2014-09-20
    Description: During megathrust earthquakes, great ruptures are accompanied by large scale mass redistribution inside the solid Earth and by ocean mass redistribution due to bathymetry changes. These large scale mass displacements can be detected using the monthly gravity maps of the GRACE satellite mission. In recent years it has become increasingly common to use the long wavelength changes in the Earth's gravity field observed by GRACE to infer seismic source properties for large megathrust earthquakes. An important advantage of space gravimetry is that it is independent from the availability of land for its measurements. This is relevant for observation of megathrust earthquakes, which occur mostly offshore, such as the $M_{\text{w}} \sim 9$ 2004 Sumatra–Andaman, 2010 Maule (Chile) and 2011 Tohoku-Oki (Japan) events. In Broerse et al. , we examined the effect of the presence of an ocean above the rupture on long wavelength gravity changes and showed it to be of the first order. Here we revisit the implementation of an ocean layer through the sea level equation and compare the results with approximated methods that have been used in the literature. One of the simplifications usually lies in the assumption of a globally uniform ocean layer. We show that especially in the case of the 2010 Maule earthquake, due to the closeness of the South American continent, the uniform ocean assumption is not valid and causes errors up to 57 per cent for modelled peak geoid height changes (expressed at a spherical harmonic truncation degree of 40). In addition, we show that when a large amount of slip occurs close to the trench, horizontal motions of the ocean floor play a mayor role in the ocean contribution to gravity changes. Using a slip model of the 2011 Tohoku-Oki earthquake that places the majority of slip close to the surface, the peak value in geoid height change increases by 50 per cent due to horizontal ocean floor motion. Furthermore, we test the influence of the maximum spherical harmonic degree at which the sea level equation is performed for sea level changes occurring along coastlines, which shows to be important for relative sea level changes occurring along the shore. Finally, we demonstrate that ocean floor loading, self-gravitation of water and conservation of water mass are of second order importance for coseismic gravity changes. When GRACE observations are used to determine earthquake parameters such as seismic moment or source depth, the uniform ocean layer method introduces large biases, depending on the location of the rupture with respect to the continent. The same holds for interpreting shallow slip when horizontal motions are not properly accounted for in the ocean contribution. In both cases the depth at which slip occurs will be underestimated.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2014-09-27
    Description: While mRNA stability has been demonstrated to control rates of translation, generating both global and local synonymous codon biases in many unicellular organisms, this explanation cannot adequately explain why codon bias strongly tracks neighboring intergene GC content; suggesting that structural dynamics of DNA might also influence codon choice. Because minor groove width is highly governed by 3-base periodicity in GC, the existence of triplet-based codons might imply a functional role for the optimization of local DNA molecular dynamics via GC content at synonymous sites (GC3). We confirm a strong association between GC3-related intrinsic DNA flexibility and codon bias across 24 different prokaryotic multiple whole-genome alignments. We develop a novel test of natural selection targeting synonymous sites and demonstrate that GC3-related DNA backbone dynamics have been subject to moderate selective pressure, perhaps contributing to our observation that many genes possess extreme DNA backbone dynamics for their given protein space. This dual function of codons may impose universal functional constraints affecting the evolution of synonymous and non-synonymous sites. We propose that synonymous sites may have evolved as an ‘accessory’ during an early expansion of a primordial genetic code, allowing for multiplexed protein coding and structural dynamic information within the same molecular context.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2014-09-07
    Description: Long-term volcanic subsidence provides insight into intereruptive processes, which comprise the longest portion of the eruptive cycle. Ground-based geodetic surveys of Medicine Lake Volcano (MLV), northern CA, document subsidence at rates of ~–10 mm yr –1 between 1954 and 2004. The long observation period plus the duration and stable magnitude of this signal presents an ideal opportunity to study long-term volcanic deformation, but this first requires accurate knowledge of the geometry and magnitude of the source. Best-fitting analytical source models to past levelling and GPS data sets show conflicting source parameters—primarily the model depth. To overcome this, we combine multiple tracks of InSAR data, each with a different look angle, to improve upon the spatial resolution of ground-based measurements. We compare the results from InSAR to those of past geodetic studies, extending the geodetic record to 2011 and demonstrating that subsidence at MLV continues at ~–10 mm yr –1 . Using geophysical inversions, we obtain the best-fitting analytical source model—a sill located at 9–10 km depth beneath the caldera. This model geometry is similar to those of past studies, providing a good fit to the high spatial density of InSAR measurements, while accounting for the high ratio of vertical to horizontal deformation derived from InSAR and recorded by existing levelling and GPS data sets. We discuss possible causes of subsidence and show that this model supports the hypothesis that deformation at MLV is driven by tectonic extension, gravitational loading, plus a component of volume loss at depth, most likely due to cooling and crystallization within the intrusive complex that underlies the edifice. Past InSAR surveys at MLV, and throughout the Cascades, are of variable success due to dense vegetation, snow cover and atmospheric artefacts. In this study, we demonstrate how InSAR may be successfully used in this setting by applying a suite of multitemporal analysis methods that account for atmospheric and orbital noise sources. These methods include: a stacking strategy based upon the noise characteristics of each data set; pixelwise rate-map formation (-RATE) and persistent scatterer InSAR (StaMPS).
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2014-09-11
    Description: In the literature, the inverted coseismic slip models from seismological and geodetic data for the 2011 Tohoku-Oki earthquake portray significant discrepancies, in particular regarding the intensity and the distribution of the rupture near the trench. For a megathrust earthquake, it is difficult to discern the slip along the shallow part of the fault from the geodetic data, which are often acquired on land. In this paper, we discuss the uncertainties in the slip distribution inversion using the geodetic data for the 2011 Tohoku earthquake and the Fully Bayesian Inversion method. These uncertainties are due to the prior information regarding the boundary conditions at the edges of the fault, the dip subduction angle and the smoothing operator. Using continuous GPS data from the Japan Island, the results for the rigid and free boundary conditions show that they produce remarkably different slip distributions at shallow depths, with the latter producing a large slip exceeding 30 m near the surface. These results indicate that the smoothing operator (gradient or Laplacian schemes) does not severely affect the slip pattern. To better invert the coseismic slip, we then introduce the ocean bottom GPS (OB-GPS) data, which improve the resolution of the shallow part of the fault. We obtain a near-trench slip greater than 40 m that reaches the Earth's surface, regardless of which boundary condition is used. Additionally, we show that using a mean dip angle for the fault as derived from subduction models is adequate if the goal is to invert for the general features of the slip pattern of this megathrust event.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2014-12-17
    Description: The thermophilic fungus Chaetomium thermophilum holds great promise for structural biology. To increase the efficiency of its biochemical and structural characterization and to explore its thermophilic properties beyond those of individual proteins, we obtained transcriptomics and proteomics data, and integrated them with computational annotation methods and a multitude of biochemical experiments conducted by the structural biology community. We considerably improved the genome annotation of Chaetomium thermophilum and characterized the transcripts and expression of thousands of genes. We furthermore show that the composition and structure of the expressed proteome of Chaetomium thermophilum is similar to its mesophilic relatives. Data were deposited in a publicly available repository and provide a rich source to the structural biology community.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2014-10-16
    Description: Complications arise in the interpretation of gravity fields because of interference from systematic degradations, such as boundary blurring and distortion. The major sources of these degradations are the various systematic errors that inevitably occur during gravity field data acquisition, discretization and geophysical forward modelling. To address this problem, we evaluate deconvolution method that aim to detect the clear horizontal boundaries of anomalous sources by the suppression of systematic errors. A convolution-based multilayer projection model, based on the classical 3-D gravity field forward model, is innovatively derived to model the systematic error degradation. Our deconvolution algorithm is specifically designed based on this multilayer projection model, in which three types of systematic error are defined. The degradations of the different systematic errors are considered in the deconvolution algorithm. As the primary source of degradation, the convolution-based systematic error is the main object of the multilayer projection model. Both the random systematic error and the projection systematic error are shown to form an integral part of the multilayer projection model, and the mixed norm regularization method and the primal-dual optimization method are therefore employed to control these errors and stabilize the deconvolution solution. We herein analyse the parameter identification and convergence of the proposed algorithms, and synthetic and field data sets are both used to illustrate their effectiveness. Additional synthetic examples are specifically designed to analyse the effects of the projection systematic error, which is caused by the uncertainty associated with the estimation of the impulse response function.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2012-10-10
    Description: The Joint BioEnergy Institute Inventory of Composable Elements (JBEI-ICEs) is an open source registry platform for managing information about biological parts. It is capable of recording information about ‘legacy’ parts, such as plasmids, microbial host strains and Arabidopsis seeds, as well as DNA parts in various assembly standards. ICE is built on the idea of a web of registries and thus provides strong support for distributed interconnected use. The information deposited in an ICE installation instance is accessible both via a web browser and through the web application programming interfaces, which allows automated access to parts via third-party programs. JBEI-ICE includes several useful web browser-based graphical applications for sequence annotation, manipulation and analysis that are also open source. As with open source software, users are encouraged to install, use and customize JBEI-ICE and its components for their particular purposes. As a web application programming interface, ICE provides well-developed parts storage functionality for other synthetic biology software projects. A public instance is available at public-registry.jbei.org, where users can try out features, upload parts or simply use it for their projects. The ICE software suite is available via Google Code, a hosting site for community-driven open source projects.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2014-05-01
    Description: Shotgun metagenome sequencing has become a fast, cheap and high-throughput technology for characterizing microbial communities in complex environments and human body sites. However, accurate identification of microorganisms at the strain/species level remains extremely challenging. We present a novel k -mer-based approach, termed GSMer, that identifies genome-specific markers (GSMs) from currently sequenced microbial genomes, which were then used for strain/species-level identification in metagenomes. Using 5390 sequenced microbial genomes, 8 770 321 50-mer strain-specific and 11 736 360 species-specific GSMs were identified for 4088 strains and 2005 species (4933 strains), respectively. The GSMs were first evaluated against mock community metagenomes, recently sequenced genomes and real metagenomes from different body sites, suggesting that the identified GSMs were specific to their targeting genomes. Sensitivity evaluation against synthetic metagenomes with different coverage suggested that 50 GSMs per strain were sufficient to identify most microbial strains with ≥0.25 x coverage, and 10% of selected GSMs in a database should be detected for confident positive callings. Application of GSMs identified 45 and 74 microbial strains/species significantly associated with type 2 diabetes patients and obese/lean individuals from corresponding gastrointestinal tract metagenomes, respectively. Our result agreed with previous studies but provided strain-level information. The approach can be directly applied to identify microbial strains/species from raw metagenomes, without the effort of complex data pre-processing.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2014-06-21
    Description: We propose to test if gravimetry can prove useful in discriminating different models of long-term deep crustal processes in the case of the Taiwan mountain belt. We discuss two existing tectonic models that differ in the deep processes proposed to sustain the long-term growth of the orogen. One model assumes underplating of the uppermost Eurasian crust with subduction of the deeper part of the crust into the mantle. The other one suggests the accretion of the whole Eurasian crust above crustal-scale ramps, the lower crust being accreted into the collisional orogen. We compute the temporal gravity changes caused only by long-term rock mass transfers at depth for each of them. We show that the underplating model implies a rate of gravity change of –6 x 10 –2 μGal yr –1 , a value that increases to 2 x 10 –2 μGal yr –1 if crustal subduction is neglected. If the accretion of the whole Eurasian crust occurs, a rate of 7 x 10 –2 μGal yr –1 is obtained. The two models tested differ both in signal amplitude and spatial distribution. The yearly gravity changes expected by long-term deep crustal mass processes in Taiwan are two orders of magnitude below the present-day uncertainty of land-based gravity measurements. Assuming that these annually averaged long-term gravity changes will linearly accumulate with ongoing mountain building, multidecadal time-series are needed to identify comparable rates of gravity change. However, as gravity is sensitive to any mass redistribution, effects of short-term processes such as seismicity and surface mass transfers (erosion, sedimentation, ground-water) may prevent from detecting any long-term deep signal. This study indicates that temporal gravity is not appropriate for deciphering the long-term deep crustal processes involved in the Taiwan mountain belt.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2014-06-21
    Description: The computation of quasi-static deformation for axisymmetric viscoelastic structures on a gravitating spherical earth is addressed using the spectral element method (SEM). A 2-D spectral element domain is defined with respect to spherical coordinates of radius and angular distance from a pole of symmetry, and 3-D viscoelastic structure is assumed to be azimuthally symmetric with respect to this pole. A point dislocation source that is periodic in azimuth is implemented with a truncated sequence of azimuthal order numbers. Viscoelasticity is limited to linear rheologies and is implemented with the correspondence principle in the Laplace transform domain. This leads to a series of decoupled 2-D problems which are solved with the SEM. Inverse Laplace transform of the independent 2-D solutions leads to the time-domain solution of the 3-D equations of quasi-static equilibrium imposed on a 2-D structure. The numerical procedure is verified through comparison with analytic solutions for finite faults embedded in a laterally homogeneous viscoelastic structure. This methodology is applicable to situations where the predominant structure varies in one horizontal direction, such as a structural contrast across (or parallel to) a long strike-slip fault.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2014-06-21
    Description: The geodetic rates for the gravity variation and vertical uplift in polar regions subject to past and present-day ice-mass changes (PDIMCs) provide important insight into the rheological structure of the Earth. We provide an update of the rates observed at Ny-Ålesund, Svalbard. To do so, we extract and remove the significant seasonal content from the observations. The rate of gravity variations, derived from absolute and relative gravity measurements, is –1.39 ± 0.11 μGal yr –1 . The rate of vertical displacements is estimated using GPS and tide gauge measurements. We obtain 7.94 ± 0.21 and 8.29 ± 1.60 mm yr –1 , respectively. We compare the extracted signal with that predicted by GLDAS/Noah and ERA-interim hydrology models. We find that the seasonal gravity variations are well-represented by local hydrology changes contained in the ERA-interim model. The phase of seasonal vertical displacements are due to non-local continental hydrology and non-tidal ocean loading. However, a large part of the amplitude of the seasonal vertical displacements remains unexplained. The geodetic rates are used to investigate the asthenosphere viscosity and lithosphere/asthenosphere thicknesses. We first correct the updated geodetic rates for those induced by PDIMCs in Svalbard, using published results, and the sea level change due to the melting of the major ice reservoirs. We show that the latter are at the level of the geodetic rate uncertainties and are responsible for rates of gravity variations and vertical displacements of –0.29 ± 0.03 μGal yr –1 and 1.11 ± 0.10 mm yr –1 , respectively. To account for the late Pleistocene deglaciation, we use the global ice evolution model ICE-3G. The Little Ice Age (LIA) deglaciation in Svalbard is modelled using a disc load model with a simple linear temporal evolution. The geodetic rates at Ny-Ålesund induced by the past deglaciations depend on the viscosity structure of the Earth. We find that viscous relaxation time due to the LIA deglaciation in Svalbard is more than 60 times shorter than that due to the Pleistocene deglaciation. We also find that the response to past and PDIMCs of an Earth model with asthenosphere viscosities ranging between 1.0 and 5.5 x 10 18 Pa s and lithosphere (resp. asthenosphere) thicknesses ranging between 50 and 100 km (resp. 120 and 170 km) can explain the rates derived from geodetic observations.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2014-11-16
    Description: The 2 principle and the unbiased predictive risk estimator are used to determine optimal regularization parameters in the context of 3-D focusing gravity inversion with the minimum support stabilizer. At each iteration of the focusing inversion the minimum support stabilizer is determined and then the fidelity term is updated using the standard form transformation. Solution of the resulting Tikhonov functional is found efficiently using the singular value decomposition of the transformed model matrix, which also provides for efficient determination of the updated regularization parameter each step. Experimental 3-D simulations using synthetic data of a dipping dike and a cube anomaly demonstrate that both parameter estimation techniques outperform the Morozov discrepancy principle for determining the regularization parameter. Smaller relative errors of the reconstructed models are obtained with fewer iterations. Data acquired over the Gotvand dam site in the south-west of Iran are used to validate use of the methods for inversion of practical data and provide good estimates of anomalous structures within the subsurface.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2014-11-19
    Description: Global navigation satellite systems (GNSSs) have revealed that a mega-thrust earthquake that occurs in an island-arc trench system causes post-seismic crustal deformation. Such crustal deformation data have been interpreted by combining three mechanisms: afterslip, poroelastic rebound and viscoelastic relaxation. It is seismologically important to determine the contribution of each mechanism because it provides frictional properties between the plate boundaries and viscosity estimates in the asthenosphere which are necessary to evaluate the stress behaviour during earthquake cycles. However, the observation sites of GNSS are mostly deployed over land and can detect only a small part of the large-scale deformation, which precludes a clear separation of the mechanisms. To extend the spatial coverage of the deformation area, recent studies started to use satellite gravity data that can detect long-wavelength deformations over the ocean. To date, compared with theoretical models for calculating the post-seismic crustal deformation, a few models have been proposed to interpret the corresponding gravity variations. Previous approaches have adopted approximations for the effects of compressibility, sphericity and self-gravitation when computing gravity changes. In this study, a new spectral-finite element approach is presented to consider the effects of material compressibility for Burgers viscoelastic earth model with a laterally heterogeneous viscosity distribution. After the basic principles are explained, it is applied to the 2004 Sumatra–Andaman earthquake. For this event, post-seismic deformation mechanisms are still a controversial topic. Using the developed approach, it is shown that the spatial patterns of gravity change generated by the above three mechanisms clearly differ from one another. A comparison of the theoretical simulation results with the satellite gravity data obtained from the Gravity Recovery and Climate Experiment reveals that both afterslip and viscoelastic relaxation are occurring. Considering the spatial patterns in satellite gravity fields is an effective method for investigating post-seismic deformation mechanisms.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2014-09-02
    Description: Conventionally, overall gene expressions from microarrays are used to infer gene networks, but it is challenging to account splicing isoforms. High-throughput RNA Sequencing has made splice variant profiling practical. However, its true merit in quantifying splicing isoforms and isoform-specific exon expressions is not well explored in inferring gene networks. This study demonstrates SpliceNet, a method to infer isoform-specific co-expression networks from exon-level RNA-Seq data, using large dimensional trace. It goes beyond differentially expressed genes and infers splicing isoform network changes between normal and diseased samples. It eases the sample size bottleneck; evaluations on simulated data and lung cancer-specific ERBB2 and MAPK signaling pathways, with varying number of samples, evince the merit in handling high exon to sample size ratio datasets. Inferred network rewiring of well established Bcl-x and EGFR centered networks from lung adenocarcinoma expression data is in good agreement with literature. Gene level evaluations demonstrate a substantial performance of SpliceNet over canonical correlation analysis, a method that is currently applied to exon level RNA-Seq data. SpliceNet can also be applied to exon array data. SpliceNet is distributed as an R package available at http://www.jjwanglab.org/SpliceNet .
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2014-09-02
    Description: We present a new approach to automatic training of a eukaryotic ab initio gene finding algorithm. With the advent of Next-Generation Sequencing, automatic training has become paramount, allowing genome annotation pipelines to keep pace with the speed of genome sequencing. Earlier we developed GeneMark-ES, currently the only gene finding algorithm for eukaryotic genomes that performs automatic training in unsupervised ab initio mode. The new algorithm, GeneMark-ET augments GeneMark-ES with a novel method that integrates RNA-Seq read alignments into the self-training procedure. Use of ‘assembled’ RNA-Seq transcripts is far from trivial; significant error rate of assembly was revealed in recent assessments. We demonstrated in computational experiments that the proposed method of incorporation of ‘unassembled’ RNA-Seq reads improves the accuracy of gene prediction; particularly, for the 1.3 GB genome of Aedes aegypti the mean value of prediction Sensitivity and Specificity at the gene level increased over GeneMark-ES by 24.5%. In the current surge of genomic data when the need for accurate sequence annotation is higher than ever, GeneMark-ET will be a valuable addition to the narrow arsenal of automatic gene prediction tools.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2014-09-02
    Description: Functional mechanisms of biomolecules often manifest themselves precisely in transient conformational substates. Researchers have long sought to structurally characterize dynamic processes in non-coding RNA, combining experimental data with computer algorithms. However, adequate exploration of conformational space for these highly dynamic molecules, starting from static crystal structures, remains challenging. Here, we report a new conformational sampling procedure, KGSrna, which can efficiently probe the native ensemble of RNA molecules in solution. We found that KGSrna ensembles accurately represent the conformational landscapes of 3D RNA encoded by NMR proton chemical shifts. KGSrna resolves motionally averaged NMR data into structural contributions; when coupled with residual dipolar coupling data, a KGSrna ensemble revealed a previously uncharacterized transient excited state of the HIV-1 trans-activation response element stem–loop. Ensemble-based interpretations of averaged data can aid in formulating and testing dynamic, motion-based hypotheses of functional mechanisms in RNAs with broad implications for RNA engineering and therapeutic intervention.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2014-09-02
    Description: Binding of transcription factors to their binding sites in promoter regions is the fundamental event in transcriptional gene regulation. When a transcription factor binding site is located within a nucleosome, the DNA has to partially unwrap from the nucleosome to allow transcription factor binding. This reduces the rate of transcription factor binding and is a known mechanism for regulation of gene expression via chromatin structure. Recently a second mechanism has been reported where transcription factor off-rates are dramatically increased when binding to target sites within the nucleosome. There are two possible explanations for such an increase in off-rate short of an active role of the nucleosome in pushing the transcription factor off the DNA: (i) for dimeric transcription factors the nucleosome can change the equilibrium between monomeric and dimeric binding or (ii) the nucleosome can change the equilibrium between specific and non-specific binding to the DNA. We explicitly model both scenarios and find that dimeric binding can explain a large increase in off-rate while the non-specific binding model cannot be reconciled with the large, experimentally observed increase. Our results suggest a general mechanism how nucleosomes increase transcription factor dissociation to promote exchange of transcription factors and regulate gene expression.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2014-08-07
    Description: Some of the major geothermal anomalies in central Europe are linked to tectonic structures within the top of crystalline basement, which modify strongly the top of this basement. Their assessment is a major challenge in exploration geophysics. Gravity has been proven to be suitable for the detection of mainly large scale lithological and structural inhomogeneities. Indeed, it is well known and proven by different wells that, for example, in northern Switzerland extended negative anomalies are linked to such structures. Due to depth limitation of wells, there vertical extension is often unknown. In this study, we have investigated the potential of gravity for the geometrical characterization of such basement structures. Our approach consists in the combination of the series of Butterworth filters, geological modelling and best-fitting between observed and computed residual anomalies. In this respect, filters of variable wavelength are applied to observed and computed gravity data. The geological model is discretized into a finite element mesh. Near-surface anomalies and the effect of the sedimentary cover were eliminated using cut-off wavelength of 10 km and geological and seismic information. We analysed the potential of preferential Butterworth filtering in a sensitivity study and applied the above mentioned approach to part of the Swiss molasses basin. Sensitivity analyses reveal that such sets of residual anomalies represents a pseudo-tomography revealing the distribution of different structures with depth. This finding allows for interpreting negative anomalies in terms of 3-D volumes. Best-fitting then permits determination of the most likely 3-D geometries of such basement structures. Our model fits both, geological observations and gravity: among 10 deep boreholes in the studied area, six reach the respective units and confirm our distribution of the negative (and positive) anomalies.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2014-08-15
    Description: High-throughput sequencing technologies, including RNA-seq, have made it possible to move beyond gene expression analysis to study transcriptional events including alternative splicing and gene fusions. Furthermore, recent studies in cancer have suggested the importance of identifying transcriptionally altered loci as biomarkers for improved prognosis and therapy. While many statistical methods have been proposed for identifying novel transcriptional events with RNA-seq, nearly all rely on contrasting known classes of samples, such as tumor and normal. Few tools exist for the unsupervised discovery of such events without class labels. In this paper, we present SigFuge for identifying genomic loci exhibiting differential transcription patterns across many RNA-seq samples. SigFuge combines clustering with hypothesis testing to identify genes exhibiting alternative splicing, or differences in isoform expression. We apply SigFuge to RNA-seq cohorts of 177 lung and 279 head and neck squamous cell carcinoma samples from the Cancer Genome Atlas, and identify several cases of differential isoform usage including CDKN2A , a tumor suppressor gene known to be inactivated in a majority of lung squamous cell tumors. By not restricting attention to known sample stratifications, SigFuge offers a novel approach to unsupervised screening of genetic loci across RNA-seq cohorts. SigFuge is available as an R package through Bioconductor.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2014-08-15
    Description: Next-generation sequencing (NGS) technologies enable new insights into the diversity of virus populations within their hosts. Diversity estimation is currently restricted to single-nucleotide variants or to local fragments of no more than a few hundred nucleotides defined by the length of sequence reads. To study complex heterogeneous virus populations comprehensively, novel methods are required that allow for complete reconstruction of the individual viral haplotypes. Here, we show that assembly of whole viral genomes of ~8600 nucleotides length is feasible from mixtures of heterogeneous HIV-1 strains derived from defined combinations of cloned virus strains and from clinical samples of an HIV-1 superinfected individual. Haplotype reconstruction was achieved using optimized experimental protocols and computational methods for amplification, sequencing and assembly. We comparatively assessed the performance of the three NGS platforms 454 Life Sciences/Roche, Illumina and Pacific Biosciences for this task. Our results prove and delineate the feasibility of NGS-based full-length viral haplotype reconstruction and provide new tools for studying evolution and pathogenesis of viruses.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2012-12-14
    Description: Insertion sequences (ISs) are simple transposable elements present in most bacterial and archaeal genomes and play an important role in genomic evolution. The recent expansion of sequenced genomes offers the opportunity to study ISs comprehensively, but this requires efficient and accurate tools for IS annotation. We have developed an open-source program called OASIS, or Optimized Annotation System for Insertion Sequences, which automatically annotates ISs within sequenced genomes. OASIS annotations of 1737 bacterial and archaeal genomes offered an unprecedented opportunity to examine IS evolution. At a broad scale, we found that most IS families are quite widespread; however, they are not present randomly across taxa. This may indicate differential loss, barriers to exchange and/or insufficient time to equilibrate across clades. The number of ISs increases with genome length, but there is both tremendous variation and no increase in IS density for genomes 〉2 Mb. At the finer scale of recently diverged genomes, the proportion of shared IS content falls sharply, suggesting loss and/or emergence of barriers to successful cross-infection occurs rapidly. Surprisingly, even after controlling for 16S rRNA sequence divergence, the same ISs were more likely to be shared between genomes labeled as the same species rather than as different species.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2012-12-14
    Description: The study of cell-population heterogeneity in a range of biological systems, from viruses to bacterial isolates to tumor samples, has been transformed by recent advances in sequencing throughput. While the high-coverage afforded can be used, in principle, to identify very rare variants in a population, existing ad hoc approaches frequently fail to distinguish true variants from sequencing errors. We report a method (LoFreq) that models sequencing run-specific error rates to accurately call variants occurring in 〈0.05% of a population. Using simulated and real datasets (viral, bacterial and human), we show that LoFreq has near-perfect specificity, with significantly improved sensitivity compared with existing methods and can efficiently analyze deep Illumina sequencing datasets without resorting to approximations or heuristics. We also present experimental validation for LoFreq on two different platforms (Fluidigm and Sequenom) and its application to call rare somatic variants from exome sequencing datasets for gastric cancer. Source code and executables for LoFreq are freely available at http://sourceforge.net/projects/lofreq/ .
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2013-07-16
    Description: The coupling of chromosome conformation capture (3C) with next-generation sequencing technologies enables the high-throughput detection of long-range genomic interactions, via the generation of ligation products between DNA sequences, which are closely juxtaposed in vivo . These interactions involve promoter regions, enhancers and other regulatory and structural elements of chromosomes and can reveal key details of the regulation of gene expression. 3C-seq is a variant of the method for the detection of interactions between one chosen genomic element (viewpoint) and the rest of the genome. We present r3Cseq , an R/Bioconductor package designed to perform 3C-seq data analysis in a number of different experimental designs. The package reads a common aligned read input format, provides data normalization, allows the visualization of candidate interaction regions and detects statistically significant chromatin interactions, thus greatly facilitating hypothesis generation and the interpretation of experimental results. We further demonstrate its use on a series of real-world applications.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2013-05-29
    Description: Deep transcriptome sequencing (RNA-Seq) has become a vital tool for studying the state of cells in the context of varying environments, genotypes and other factors. RNA-Seq profiling data enable identification of novel isoforms, quantification of known isoforms and detection of changes in transcriptional or RNA-processing activity. Existing approaches to detect differential isoform abundance between samples either require a complete isoform annotation or fall short in providing statistically robust and calibrated significance estimates. Here, we propose a suite of statistical tests to address these open needs: a parametric test that uses known isoform annotations to detect changes in relative isoform abundance and a non-parametric test that detects differential read coverages and can be applied when isoform annotations are not available. Both methods account for the discrete nature of read counts and the inherent biological variability. We demonstrate that these tests compare favorably to previous methods, both in terms of accuracy and statistical calibrations. We use these techniques to analyze RNA-Seq libraries from Arabidopsis thaliana and Drosophila melanogaster. The identified differential RNA processing events were consistent with RT–qPCR measurements and previous studies. The proposed toolkit is available from http://bioweb.me/rdiff and enables in-depth analyses of transcriptomes, with or without available isoform annotation.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2013-11-21
    Description: The absence of a quality control (QC) system is a major weakness for the comparative analysis of genome-wide profiles generated by next-generation sequencing (NGS). This concerns particularly genome binding/occupancy profiling assays like chromatin immunoprecipitation (ChIP-seq) but also related enrichment-based studies like methylated DNA immunoprecipitation/methylated DNA binding domain sequencing, global run on sequencing or RNA-seq. Importantly, QC assessment may significantly improve multidimensional comparisons that have great promise for extracting information from combinatorial analyses of the global profiles established for chromatin modifications, the bindings of epigenetic and chromatin-modifying enzymes/machineries, RNA polymerases and transcription factors and total, nascent or ribosome-bound RNAs. Here we present an approach that associates global and local QC indicators to ChIP-seq data sets as well as to a variety of enrichment-based studies by NGS. This QC system was used to certify 〉5600 publicly available data sets, hosted in a database for data mining and comparative QC analyses.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2013-10-19
    Description: Identifying variants using high-throughput sequencing data is currently a challenge because true biological variants can be indistinguishable from technical artifacts. One source of technical artifact results from incorrectly aligning experimentally observed sequences to their true genomic origin (‘mismapping’) and inferring differences in mismapped sequences to be true variants. We developed BlackOPs, an open-source tool that simulates experimental RNA-seq and DNA whole exome sequences derived from the reference genome, aligns these sequences by custom parameters, detects variants and outputs a blacklist of positions and alleles caused by mismapping. Blacklists contain thousands of artifact variants that are indistinguishable from true variants and, for a given sample, are expected to be almost completely false positives. We show that these blacklist positions are specific to the alignment algorithm and read length used, and BlackOPs allows users to generate a blacklist specific to their experimental setup. We queried the dbSNP and COSMIC variant databases and found numerous variants indistinguishable from mapping errors. We demonstrate how filtering against blacklist positions reduces the number of potential false variants using an RNA-seq glioblastoma cell line data set. In summary, accounting for mapping-caused variants tuned to experimental setups reduces false positives and, therefore, improves genome characterization by high-throughput sequencing.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2013-10-19
    Description: The 3D chromatin structure modeling by chromatin interactions derived from Hi-C experiments is significantly challenged by the intrinsic sequencing biases in these experiments. Conventional modeling methods only focus on the bias among different chromatin regions within the same experiment but neglect the bias arising from different experimental sequencing depth. We now show that the regional interaction bias is tightly coupled with the sequencing depth, and we further identify a chromatin structure parameter as the inherent characteristics of Hi-C derived data for chromatin regions. Then we present an approach for chromatin structure prediction capable of relaxing both kinds of sequencing biases by using this identified parameter. This method is validated by intra and inter cell-line comparisons among various chromatin regions for four human cell-lines (K562, GM12878, IMR90 and H1hESC), which shows that the openness of chromatin region is well correlated with chromatin function. This method has been executed by an automatic pipeline (AutoChrom3D) and thus can be conveniently used.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2014-06-22
    Description: On 2008 October 5, a magnitude 6.6 earthquake struck the eastern termination of the intermontane Alai valley between the southern Tien Shan and the northern Pamir of Kyrgyzstan. The shallow thrust earthquake occurred in the footwall of the Main Pamir thrust, where the Pamir orogen is colliding with the southern Tien Shan mountains. We measure the coseismic surface displacements using SAR (Synthetic Aperture RADAR) data; the results show clear gradients in the vertical and horizontal directions along a complex pattern of surface ruptures and active faults. To integrate and to interpret these observations in the context of the regional tectonics, we complement the SAR data analysis with seismological data and geological field observations. While the main moment release of the Nura earthquake appears to be on the Pamir Frontal thrust, the main surface displacements and surface rupture occurred in the footwall along the NE–SW striking Irkeshtam fault. With InSAR data from ascending and descending tracks along with pixel offset measurements, we model the Nura earthquake source as a segmented rupture. One fault segment corresponds to high-angle brittle faulting at the Pamir Frontal thrust and two more fault segments show moderate-angle and low-friction thrusting at the Irkeshtam fault. Our integrated analysis of the coseismic deformation argues for rupture segmentation and strain partitioning associated to the earthquake. It possibly activated an orogenic wedge in the easternmost segment of the Pamir-Alai collision zone. Further, the style of the segmentation may be associated with the presence of Palaeogene evaporites.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2014-06-28
    Description: The terrestrial reference frame is a cornerstone for modern geodesy and its applications for a wide range of Earth sciences. The underlying assumption for establishing a terrestrial reference frame is that the motion of the solid Earth's figure centre relative to the mass centre of the Earth system on a multidecadal timescale is linear. However, past international terrestrial reference frames (ITRFs) showed unexpected accelerated motion in their translation parameters. Based on this underlying assumption, the inconsistency of relative origin motions of the ITRFs has been attributed to data reduction imperfection. We investigated the impact of surface mass loading from atmosphere, ocean, snow, soil moisture, ice sheet, glacier and sea level from 1983 to 2008 on the geocentre variations. The resultant geocentre time-series display notable trend acceleration from 1998 onward, in particular in the z -component. This effect is primarily driven by the hydrological mass redistribution in the continents (soil moisture, snow, ice sheet and glacier). The acceleration is statistically significant at the 99 per cent confidence level as determined using the Mann–Kendall test, and it is highly correlated with the satellite laser ranging determined translation series. Our study, based on independent geophysical and hydrological models, demonstrates that, in addition to systematic errors from analysis procedures, the observed non-linearity of the Earth-system behaviour at interannual timescales is physically driven and is able to explain 42 per cent of the disparity between the origins of ITRF2000 and ITRF2005, as well as the high level of consistency between the ITRF2005 and ITRF2008 origins.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2014-07-29
    Description: This paper presents a novel mathematical reformulation of the theory of the free wobble/nutation of an axisymmetric reference earth model in hydrostatic equilibrium, using the linear momentum description. The new features of this work consist in the use of (i) Clairaut coordinates (rather than spherical polars), (ii) standard spherical harmonics (rather than generalized spherical surface harmonics), (iii) linear operators (rather than J-square symbols) to represent the effects of rotational and ellipticity coupling between dependent variables of different harmonic degree and (iv) a set of dependent variables all of which are continuous across material boundaries. The resulting infinite system of coupled ordinary differential equations is given explicitly, for an elastic solid mantle and inner core, an inviscid outer core and no magnetic field. The formulation is done to second order in the Earth's ellipticity. To this order it is shown that for wobble modes (in which the lowest harmonic in the displacement field is degree 1 toroidal, with azimuthal order m  = ±1), it is sufficient to truncate the chain of coupled displacement fields at the toroidal harmonic of degree 5 in the solid parts of the earth model. In the liquid core, however, the harmonic expansion of displacement can in principle continue to indefinitely high degree at this order of accuracy. The full equations are shown to yield correct results in three simple cases amenable to analytic solution: a general earth model in rigid rotation, the tiltover mode in a homogeneous solid earth model and the tiltover and Chandler periods for an incompressible homogeneous solid earth model. Numerical results, from programmes based on this formulation, are presented in part II of this paper.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2014-07-29
    Description: Numerical solutions are presented for the formulation of the linear momentum description of Earth's dynamics using Clairaut coordinates. We have developed a number of methods to integrate the equations of motion, including starting at the Earth's centre of mass, starting at finite radius and separating the displacement associated with the primary rigid rotation. We include rotation and ellipticity to second order up to spherical harmonic T $_5^m$ , starting with the primary displacement T $_1^m$ with m  = ±1. We are able to confirm many of the previous results for models PREM (with no surface ocean) and 1066A, both in their original form and with neutrally stratified liquid cores. Our period search ranges from the near-seismic band [0.1 sidereal days (sd)] to 3500 sd, within which we have identified the four well-known wobble-nutation modes: the Free Core Nutation (retrograde) at –456 sd, the Free Inner Core Nutation (FICN, prograde) at 468 sd, the Chandler Wobble (prograde) at 402 sd, and the Inner Core Wobble (ICW, prograde) at about 2842 sd (7.8 yr) for neutral PREM. The latter value varies significantly with earth model and integration method. In addition we have verified to high accuracy the tilt-over mode at 1 sd within a factor 10 –6 . In an exhaustive search we found no additional near-diurnal wobble modes that could be identified as nutations. We show that the eigenfunctions for the as-yet-unidentified ICW are extremely sensitive to the details of the earth model, especially the core stability profile and there is no well-defined sense of its wobble relative to the mantle. Calculations are also done for a range of models derived from PREM with homogeneous layers, as well as with incompressible cores. For this kind of model the ICW ceases to have just a simple IC rigid motion when the fluid compressibility is either unchanged or multiplied by a factor 10; in this case the outer core exhibits oscillations that arise from an unstable fluid density stratification. For the FICN our results for the truncation at harmonic T 5 show less change from the T 3 truncation than a similar result reported elsewhere. Finally, we give a thorough discussion of the complete spectrum of the characteristic determinant including the location of poles and non-wobble gravity modes, and discuss in general the dynamics of the inviscid core at periods short compared to those involved in the geodynamo.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2014-07-23
    Description: Considering the drawback of existing global weighted mean temperature model, this paper uses 2006–2012 NCEP reanalysis data to establish global empirical model for mapping zenith wet delays onto precipitable water—GTm_N, takes the influence of half-year periodicity of Tm into account when modelling and estimate the initial phase of each cycle. In order to evaluate the precision of GTm_N, we use three different Tm data sets from the NCEP during 2013, 650 radiosonde stations and COSMIC occultation in 2011 to test this model. The results show that GTm_N has higher precision in both ocean and continental area in every moment of every day. The accuracy of GTm_N is higher than Bevis formulas and GTm_II models. In addition, the actual surface temperature is not required in GTm_N model, and it will have wide application in GPS meteorology.
    Keywords: Gravity, Geodesy and Tides
    Print ISSN: 0956-540X
    Electronic ISSN: 1365-246X
    Topics: Geosciences
    Published by Oxford University Press on behalf of The Deutsche Geophysikalische Gesellschaft (DGG) and the Royal Astronomical Society (RAS).
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2014-08-01
    Description: The challenge presented by high-throughput sequencing necessitates the development of novel tools for accurate alignment of reads to reference sequences. Current approaches focus on using heuristics to map reads quickly to large genomes, rather than generating highly accurate alignments in coding regions. Such approaches are, thus, unsuited for applications such as amplicon-based analysis and the realignment phase of exome sequencing and RNA-seq, where accurate and biologically relevant alignment of coding regions is critical. To facilitate such analyses, we have developed a novel tool, RAMICS, that is tailored to mapping large numbers of sequence reads to short lengths (〈10 000 bp) of coding DNA. RAMICS utilizes profile hidden Markov models to discover the open reading frame of each sequence and aligns to the reference sequence in a biologically relevant manner, distinguishing between genuine codon-sized indels and frameshift mutations. This approach facilitates the generation of highly accurate alignments, accounting for the error biases of the sequencing machine used to generate reads, particularly at homopolymer regions. Performance improvements are gained through the use of graphics processing units, which increase the speed of mapping through parallelization. RAMICS substantially outperforms all other mapping approaches tested in terms of alignment quality while maintaining highly competitive speed performance.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2012-08-08
    Description: Determining the taxonomic lineage of DNA sequences is an important step in metagenomic analysis. Short DNA fragments from next-generation sequencing projects and microbes that lack close relatives in reference sequenced genome databases pose significant problems to taxonomic attribution methods. Our new classification algorithm, RITA (Rapid Identification of Taxonomic Assignments), uses the agreement between composition and homology to accurately classify sequences as short as 50 nt in length by assigning them to different classification groups with varying degrees of confidence. RITA is much faster than the hybrid PhymmBL approach when comparable homology search algorithms are used, and achieves slightly better accuracy than PhymmBL on an artificial metagenome. RITA can also incorporate prior knowledge about taxonomic distributions to increase the accuracy of assignments in data sets with varying degrees of taxonomic novelty, and classified sequences with higher precision than the current best rank-flexible classifier. The accuracy on short reads can be increased by exploiting paired-end information, if available, which we demonstrate on a recently published bovine rumen data set. Finally, we develop a variant of RITA that incorporates accelerated homology search techniques, and generate predictions on a set of human gut metagenomes that were previously assigned to different ‘enterotypes’. RITA is freely available in Web server and standalone versions.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2012-08-08
    Description: Understanding the numerous functions that RNAs play in living cells depends critically on knowledge of their three-dimensional structure. Due to the difficulties in experimentally assessing structures of large RNAs, there is currently great demand for new high-resolution structure prediction methods. We present the novel method for the fully automated prediction of RNA 3D structures from a user-defined secondary structure. The concept is founded on the machine translation system. The translation engine operates on the RNA FRABASE database tailored to the dictionary relating the RNA secondary structure and tertiary structure elements. The translation algorithm is very fast. Initial 3D structure is composed in a range of seconds on a single processor. The method assures the prediction of large RNA 3D structures of high quality. Our approach needs neither structural templates nor RNA sequence alignment, required for comparative methods. This enables the building of unresolved yet native and artificial RNA structures. The method is implemented in a publicly available, user-friendly server RNAComposer. It works in an interactive mode and a batch mode. The batch mode is designed for large-scale modelling and accepts atomic distance restraints. Presently, the server is set to build RNA structures of up to 500 residues.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2012-09-27
    Description: Programmed –1 ribosomal frameshifting is employed in the expression of a number of viral and cellular genes. In this process, the ribosome slips backwards by a single nucleotide and continues translation of an overlapping reading frame, generating a fusion protein. Frameshifting signals comprise a heptanucleotide slippery sequence, where the ribosome changes frame, and a stimulatory RNA structure, a stem–loop or RNA pseudoknot. Antisense oligonucleotides annealed appropriately 3' of a slippery sequence have also shown activity in frameshifting, at least in vitro . Here we examined frameshifting at the U 6 A slippery sequence of the HIV gag/pol signal and found high levels of both –1 and –2 frameshifting with stem–loop, pseudoknot or antisense oligonucleotide stimulators. By examining –1 and –2 frameshifting outcomes on mRNAs with varying slippery sequence-stimulatory RNA spacing distances, we found that –2 frameshifting was optimal at a spacer length 1–2 nucleotides shorter than that optimal for –1 frameshifting with all stimulatory RNAs tested. We propose that the shorter spacer increases the tension on the mRNA such that when the tRNA detaches, it more readily enters the –2 frame on the U 6 A heptamer. We propose that mRNA tension is central to frameshifting, whether promoted by stem–loop, pseudoknot or antisense oligonucleotide stimulator.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2012-10-24
    Description: Understanding the categorization of human diseases is critical for reliably identifying disease causal genes. Recently, genome-wide studies of abnormal chromosomal locations related to diseases have mapped 〉2000 phenotype–gene relations, which provide valuable information for classifying diseases and identifying candidate genes as drug targets. In this article, a regularized non-negative matrix tri-factorization (R-NMTF) algorithm is introduced to co-cluster phenotypes and genes, and simultaneously detect associations between the detected phenotype clusters and gene clusters. The R-NMTF algorithm factorizes the phenotype–gene association matrix under the prior knowledge from phenotype similarity network and protein–protein interaction network, supervised by the label information from known disease classes and biological pathways. In the experiments on disease phenotype–gene associations in OMIM and KEGG disease pathways, R-NMTF significantly improved the classification of disease phenotypes and disease pathway genes compared with support vector machines and Label Propagation in cross-validation on the annotated phenotypes and genes. The newly predicted phenotypes in each disease class are highly consistent with human phenotype ontology annotations. The roles of the new member genes in the disease pathways are examined and validated in the protein–protein interaction subnetworks. Extensive literature review also confirmed many new members of the disease classes and pathways as well as the predicted associations between disease phenotype classes and pathways.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2012-11-04
    Description: The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign , a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign .
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2012-11-25
    Description: Identifying cancer driver genes and pathways among all somatic mutations detected in a cohort of tumors is a key challenge in cancer genomics. Traditionally, this is done by prioritizing genes according to the recurrence of alterations that they bear. However, this approach has some known limitations, such as the difficulty to correctly estimate the background mutation rate, and the fact that it cannot identify lowly recurrently mutated driver genes. Here we present a novel approach, Oncodrive-fm, to detect candidate cancer drivers which does not rely on recurrence. First, we hypothesized that any bias toward the accumulation of variants with high functional impact observed in a gene or group of genes may be an indication of positive selection and can thus be used to detect candidate driver genes or gene modules. Next, we developed a method to measure this bias (FM bias) and applied it to three datasets of tumor somatic variants. As a proof of concept of our hypothesis we show that most of the highly recurrent and well-known cancer genes exhibit a clear FM bias. Moreover, this novel approach avoids some known limitations of recurrence-based approaches, and can successfully identify lowly recurrent candidate cancer drivers.
    Keywords: Computational Methods
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2014-02-11
    Description: The advances of high-throughput sequencing offer an unprecedented opportunity to study genetic variation. This is challenged by the difficulty of resolving variant calls in repetitive DNA regions. We present a Bayesian method to estimate repeat-length variation from paired-end sequence read data. The method makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats at lengths of relevance to a range of phenotypes. We demonstrate the method’s ability to detect and quantify changes in repeat lengths from short read genomic sequence data across genotypes. We use the method to estimate repeat variation among 12 strains of Arabidopsis thaliana and demonstrate experimentally that our method compares favourably against existing methods. Using this method, we have identified all repeats across the genome, which are likely to be polymorphic. In addition, our predicted polymorphic repeats also included the only known repeat expansion in A. thaliana , suggesting an ability to discover potential unstable repeats.
    Keywords: Computational Methods, Massively Parallel (Deep) Sequencing, Genomics
    Print ISSN: 0305-1048
    Electronic ISSN: 1362-4962
    Topics: Biology
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...