ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (1,331)
  • Oxford University Press  (1,331)
  • American Association for the Advancement of Science (AAAS)
  • 2020-2022  (389)
  • 2015-2019  (942)
  • Briefings in Bioinformatics  (217)
  • 9058
  • Biology  (1,331)
  • Agriculture, Forestry, Horticulture, Fishery, Domestic Science, Nutrition
Collection
  • Articles  (1,331)
Publisher
  • Oxford University Press  (1,331)
  • American Association for the Advancement of Science (AAAS)
Years
Year
Topic
  • Biology  (1,331)
  • Agriculture, Forestry, Horticulture, Fishery, Domestic Science, Nutrition
  • Computer Science  (1,331)
  • 1
    Publication Date: 2015-09-16
    Description: For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple independent studies on the same cancer type and outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis data under the Cox model. Lasso, which is representative of a large number of regularization methods, is adopted for generating gene signatures. We examine two families of measures for quantifying the degree of overlap. The first family is based on the Cox-Lasso estimates at the optimal tunings, and the second family is based on estimates across the whole solution paths. Within each family, multiple measures, which describe the overlap from different perspectives, are introduced. The analysis of TCGA (The Cancer Genome Atlas) data on five cancer types shows that the degree of overlap varies across measures, cancer types and types of (epi)genetic measurements. More investigations are needed to better describe and understand the overlaps among gene signatures.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2015-11-20
    Description: The past two decades of microRNA (miRNA) research has solidified the role of these small non-coding RNAs as key regulators of many biological processes and promising biomarkers for disease. The concurrent development in high-throughput profiling technology has further advanced our understanding of the impact of their dysregulation on a global scale. Currently, next-generation sequencing is the platform of choice for the discovery and quantification of miRNAs. Despite this, there is no clear consensus on how the data should be preprocessed before conducting downstream analyses. Often overlooked, data preprocessing is an essential step in data analysis: the presence of unreliable features and noise can affect the conclusions drawn from downstream analyses. Using a spike-in dilution study, we evaluated the effects of several general-purpose aligners (BWA, Bowtie, Bowtie 2 and Novoalign), and normalization methods (counts-per-million, total count scaling, upper quartile scaling, Trimmed Mean of M, DESeq, linear regression, cyclic loess and quantile) with respect to the final miRNA count data distribution, variance, bias and accuracy of differential expression analysis. We make practical recommendations on the optimal preprocessing methods for the extraction and interpretation of miRNA count data from small RNA-sequencing experiments.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2015-11-20
    Description: Parameter estimation is a challenging computational problem in the reverse engineering of biological systems. Because advances in biotechnology have facilitated wide availability of time-series gene expression data, systematic parameter estimation of gene circuit models from such time-series mRNA data has become an important method for quantitatively dissecting the regulation of gene expression. By focusing on the modeling of gene circuits, we examine here the performance of three types of state-of-the-art parameter estimation methods: population-based methods, online methods and model-decomposition-based methods. Our results show that certain population-based methods are able to generate high-quality parameter solutions. The performance of these methods, however, is heavily dependent on the size of the parameter search space, and their computational requirements substantially increase as the size of the search space increases. In comparison, online methods and model decomposition-based methods are computationally faster alternatives and are less dependent on the size of the search space. Among other things, our results show that a hybrid approach that augments computationally fast methods with local search as a subsequent refinement procedure can substantially increase the quality of their parameter estimates to the level on par with the best solution obtained from the population-based methods while maintaining high computational speed. These suggest that such hybrid methods can be a promising alternative to the more commonly used population-based methods for parameter estimation of gene circuit models when limited prior knowledge about the underlying regulatory mechanisms makes the size of the parameter search space vastly large.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2015-11-20
    Description: Human housekeeping genes are often confused with essential human genes, and several studies regard both types of genes as having the same level of evolutionary conservation. However, this is not necessarily the case. To clarify this, we compared the differences between human housekeeping genes and essential human genes with respect to four aspects: the evolutionary rate (dN/dS), protein sequence identity, single-nucleotide polymorphism (SNP) density and level of linkage disequilibrium (LD). The results showed that housekeeping genes had lower evolutionary rates, higher sequence identities, lower SNP densities and higher levels of LD compared with essential genes. Together, these findings indicate that housekeeping and essential genes are two distinct types of genes, and that housekeeping genes have a higher level of evolutionary conservation. Therefore, we suggest that researchers should pay careful attention to the distinctions between housekeeping genes and essential genes. Moreover, it is still controversial whether we should substitute human orthologs of mouse essential genes for human essential genes. Therefore, we compared the evolutionary features between human orthologs of mouse essential genes and human housekeeping genes and we got inconsistent results in long-term and short-term evolutionary characteristics implying the irrationality of simply replacing human essential genes with human orthologs of mouse essential genes.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2015-11-20
    Description: Whole-genome search of genes is an essential approach to dissecting complex traits, but a marginal one-single-nucleotide polymorphism (SNP)/one-phenotype regression analysis widely used in current genome-wide association studies fails to estimate the net and cumulative effects of SNPs and reveal the developmental pattern of interplay between genes and traits. Here we describe a computational framework, which we refer to as two-side high-dimensional genome-wide association studies (2HiGWAS), to associate an ultrahigh dimension of SNPs with a high dimension of developmental trajectories measured across time and space. The model is implemented with a dual dimension-reduction procedure for both predictors and responses to select a sparse but full set of significant loci from an extremely large pool of SNPs and estimate their net time-varying effects on trait development. The model can not only help geneticists to precisely identify an entire set of genes underlying complex traits but also allow them to elucidate a global picture of how genes control developmental and dynamic processes of trait formation. We investigated the statistical properties of the model via extensive simulation studies. With the increasing availability of GWAS in various organisms, 2HiGWAS will have important implications for genetic studies of developmental compelx traits.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2015-11-20
    Description: Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA-seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the transcripts listed in a curated database. A more ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. The most challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of computational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assembly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correlation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not absorb much signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of different methods is freely available online ( https://github.com/boboppie/RSSS ) and can be expanded by interested parties to include methods other than the exemplars presented in this article.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2015-11-20
    Description: Significant efforts have been made recently to improve data throughput and data quality in screening technologies related to drug design. The modern pharmaceutical industry relies heavily on high-throughput screening (HTS) and high-content screening (HCS) technologies, which include small molecule, complementary DNA (cDNA) and RNA interference (RNAi) types of screening. Data generated by these screening technologies are subject to several environmental and procedural systematic biases, which introduce errors into the hit identification process. We first review systematic biases typical of HTS and HCS screens. We highlight that study design issues and the way in which data are generated are crucial for providing unbiased screening results. Considering various data sets, including the publicly available ChemBank data, we assess the rates of systematic bias in experimental HTS by using plate-specific and assay-specific error detection tests. We describe main data normalization and correction techniques and introduce a general data preprocessing protocol. This protocol can be recommended for academic and industrial researchers involved in the analysis of current or next-generation HTS data.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2015-11-20
    Description: De novo motif discovery is a difficult computational task. Historically, dedicated algorithms always reported a high percentage of false positives. Their performance did not improve considerably even after they adapted to handle large amounts of chromatin immunoprecipitation sequencing (ChIP-Seq) data. Several studies have advocated aggregating complementary algorithms, combining their predictions to increase the accuracy of the results. This led to the development of ensemble methods. To form a better view on modern ensembles, we review all compound tools designed for ChIP-Seq. After a brief introduction to basic algorithms and early ensembles, we describe the most recent tools. We highlight their limitations and strengths by presenting their architecture, the input options and their output. To provide guidance for next-generation sequencing practitioners, we observe the differences and similarities between them. Last but not least, we identify and recommend several features to be implemented by any novel ensemble algorithm.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2015-05-19
    Description: The combination of DNA bisulfite treatment with high-throughput sequencing technologies has enabled investigation of genome-wide DNA methylation beyond CpG sites and CpG islands. These technologies have opened new avenues to understand the interplay between epigenetic events, chromatin plasticity and gene regulation. However, the processing, managing and mining of this huge volume of data require specialized computational tools and statistical methods that are yet to be standardized. Here, we describe a complete bisulfite sequencing analysis workflow, including recently developed programs, highlighting each of the crucial analysis steps required, i.e. sequencing quality control, reads alignment, methylation scoring, methylation heterogeneity assessment, genomic features annotation, data visualization and determination of differentially methylated cytosines. Moreover, we discuss the limitations of these technologies and considerations to perform suitable analyses.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2015-05-19
    Description: Understanding the genetic basis of human traits/diseases and the underlying mechanisms of how these traits/diseases are affected by genetic variations is critical for public health. Current genome-wide functional genomics data uncovered a large number of functional elements in the noncoding regions of human genome, providing new opportunities to study regulatory variants (RVs). RVs play important roles in transcription factor bindings, chromatin states and epigenetic modifications. Here, we systematically review an array of methods currently used to map RVs as well as the computational approaches in annotating and interpreting their regulatory effects, with emphasis on regulatory single-nucleotide polymorphism. We also briefly introduce experimental methods to validate these functional RVs.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2015-05-19
    Description: The detection of parent-of-origin effects aims to identify whether the functionality of alleles, and in turn associated phenotypic traits, depends on the parental origin of the alleles. Different parent-of-origin effects have been identified through a variety of mechanisms and a number of statistical methodologies for their detection have been proposed, in particular for genome-wide association studies (GWAS). GWAS have had limited success in explaining the heritability of many complex disorders and traits, but successful identification of parent-of-origin effects using trio (mother, father and offspring) GWAS may help shed light on this missing heritability. However, it is important to choose the most appropriate parent-of-origin test or methodology, given knowledge of the phenotype, amount of available data and the type of parent-of-origin effect(s) being considered. This review brings together the parent-of-origin detection methodologies available, comparing them in terms of power and type I error for a number of different simulated data scenarios, and finally offering guidance as to the most appropriate choice for the different scenarios.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2015-05-19
    Description: With the increasing recognition of its role in trait and disease development, it is crucial to account for genetic imprinting to illustrate the genetic architecture of complex traits. Genetic mapping can be innovated to test and estimate effects of genetic imprinting in a segregating population derived from experimental crosses. Here, we describe and assess a design for imprinting detection in natural plant populations. This design is to sample maternal plants at random from a natural population and collect open-pollinated (OP) seeds randomly from each maternal plant and germinate them into seedlings. A two-stage hierarchical platform is constructed to jointly analyze maternal and OP progeny markers. Through tracing the segregation and transmission of alleles from the parental to progeny generation, this platform allows parent-of-origin-dependent gene expression to be discerned, providing an avenue to estimate the effect of imprinting genes on a quantitative trait. The design is derived to estimate imprinting effects expressed at the haplotype level. Its usefulness and utilization were validated through computer simulation. This OP-based design provides a tool to detect the genomic distribution and pattern of imprinting genes as an important component of heritable variation that is neglected in traditional genetic studies of complex traits.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2015-05-19
    Description: Breast cancer was traditionally perceived as a single disease; however, recent advances in gene expression and genomic profiling have revealed that breast cancer is in fact a collection of diseases exhibiting distinct anatomical features, responses to treatment and survival outcomes. Consequently, a number of schemes have been proposed for subtyping of breast cancer to bring out the biological and clinically relevant characteristics of the subtypes. Although some of these schemes capture underlying molecular differences, others predict variations in response to treatment and survival patterns. However, despite this diversity in the approaches, it is clear that molecular mechanisms drive clinical outcomes, and therefore an effective scheme should integrate molecular as well as clinical parameters to enable deeper understanding of cancer mechanisms and allow better decision making in the clinic. Here, using a large cohort of ~550 breast tumours from The Cancer Genome Atlas, we systematically evaluate a number of expression-based schemes including at least eight molecular pathways implicated in breast cancer and three prognostic signatures, across a variety of classification scenarios covering molecular characteristics, biomarker status, tumour stages and survival patterns. We observe that a careful combination of these schemes yields better classification results compared with using them individually, thus confirming that molecular mechanisms and clinical outcomes are related and that an effective scheme should therefore integrate both these parameters to enable a deeper understanding of the cancer.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2015-05-19
    Description: microRNAs (miRNAs) are important gene regulators. They control a wide range of biological processes and are involved in several types of cancers. Thus, exploring miRNA functions is important for diagnostics and therapeutics. To date, there are few feasible experimental techniques for discovering miRNA regulatory mechanisms. Alternatively, predictions of miRNA–mRNA regulatory relationships by computational methods have increasingly achieved promising results. Computational approaches are proving their ability as effective tools in reducing the number of biological experiments that must be conducted and to assist with the design of the experiments. In this review, we categorize and review different computational approaches to identify miRNA activities and functions, including the co-regulation of miRNAs and transcription factors. Our main focuses are on the recent approaches that use multiple data types for exploring miRNA functions. We discuss the remaining challenges in the evaluation and selection of models based on the results from a case study. Finally, we analyse the remaining challenges of each computational approach and suggest some future research directions.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2015-05-19
    Description: Technological advances in next-generation sequencing have uncovered a wide spectrum of aberrations in cancer genomes. The extreme diversity in cancer mutations necessitates computational approaches to differentiate between the ‘drivers’ with vital function in cancer progression and those nonfunctional ‘passengers’. Although individual driver mutations are routinely identified, mutational profiles of different tumors are highly heterogeneous. There is growing consensus that pathways rather than single genes are the primary target of mutations. Here we review extant bioinformatics approaches to identifying oncogenic drivers at different mutational levels, highlighting the strategies for discovering driver pathways and networks from cancer mutation data. These approaches will help reduce the mutation complexity, thus providing a simplified picture of cancer.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2015-05-19
    Description: Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2016-07-16
    Description: The detailed, atomistic-level understanding of molecular signaling along the tumor-suppressive Hippo signaling pathway that controls tissue homeostasis by balancing cell proliferation and death through apoptosis is a promising avenue for the discovery of novel anticancer drug targets. The activation of kinases such as Mammalian STE20-Like Protein Kinases 1 and 2 (MST1 and MST2)—modulated through both homo- and heterodimerization (e.g. interactions with Ras association domain family, RASSF, enzymes)—is a key upstream event in this pathway and remains poorly understood. On the other hand, RASSFs (such as RASSF1A or RASSF5) act as important apoptosis activators and tumor suppressors, although their exact regulatory roles are also unclear. We present recent molecular studies of signaling along the Ras-RASSF-MST pathway, which controls growth and apoptosis in eukaryotic cells, including a variety of modern molecular modeling and simulation techniques. Using recently available structural information, we discuss the complex regulatory scenario according to which RASSFs perform dual signaling functions, either preventing or promoting MST2 activation, and thus control cell apoptosis. Here, we focus on recent studies highlighting the special role being played by the specific interactions between the helical Salvador/RASSF/Hippo (SARAH) domains of MST2 and RASSF1a or RASSF5 enzymes. These studies are crucial for integrating atomistic-level mechanistic information about the structures and conformational dynamics of interacting proteins, with information available on their system-level functions in cellular signaling.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2016-07-16
    Description: Big-data-based edge biomarker is a new concept to characterize disease features based on biomedical big data in a dynamical and network manner, which also provides alternative strategies to indicate disease status in single samples. This article gives a comprehensive review on big-data-based edge biomarkers for complex diseases in an individual patient, which are defined as biomarkers based on network information and high-dimensional data. Specifically, we firstly introduce the sources and structures of biomedical big data accessible in public for edge biomarker and disease study. We show that biomedical big data are typically ‘small-sample size in high-dimension space', i.e. small samples but with high dimensions on features (e.g. omics data) for each individual, in contrast to traditional big data in many other fields characterized as ‘large-sample size in low-dimension space', i.e. big samples but with low dimensions on features. Then, we demonstrate the concept, model and algorithm for edge biomarkers and further big-data-based edge biomarkers. Dissimilar to conventional biomarkers, edge biomarkers, e.g. module biomarkers in module network rewiring-analysis, are able to predict the disease state by learning differential associations between molecules rather than differential expressions of molecules during disease progression or treatment in individual patients. In particular, in contrast to using the information of the common molecules or edges (i.e.molecule-pairs) across a population in traditional biomarkers including network and edge biomarkers, big-data-based edge biomarkers are specific for each individual and thus can accurately evaluate the disease state by considering the individual heterogeneity. Therefore, the measurement of big data in a high-dimensional space is required not only in the learning process but also in the diagnosing or predicting process of the tested individual. Finally, we provide a case study on analyzing the temporal expression data from a malaria vaccine trial by big-data-based edge biomarkers from module network rewiring-analysis. The illustrative results show that the identified module biomarkers can accurately distinguish vaccines with or without protection and outperformed previous reported gene signatures in terms of effectiveness and efficiency.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2016-07-16
    Description: Molecular interrogation of a biological sample through DNA sequencing, RNA and microRNA profiling, proteomics and other assays, has the potential to provide a systems level approach to predicting treatment response and disease progression, and to developing precision therapies. Large publicly funded projects have generated extensive and freely available multi-assay data resources; however, bioinformatic and statistical methods for the analysis of such experiments are still nascent. We review multi-assay genomic data resources in the areas of clinical oncology, pharmacogenomics and other perturbation experiments, population genomics and regulatory genomics and other areas, and tools for data acquisition. Finally, we review bioinformatic tools that are explicitly geared toward integrative genomic data visualization and analysis. This review provides starting points for accessing publicly available data and tools to support development of needed integrative methods.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2016-07-16
    Description: One of the major challenges in biology concerns the integration of data across length and time scales into a consistent framework: how do macroscopic properties and functionalities arise from the molecular regulatory networks—and how can they change as a result of mutations? Morphogenesis provides an excellent model system to study how simple molecular networks robustly control complex processes on the macroscopic scale despite molecular noise, and how important functional variants can emerge from small genetic changes. Recent advancements in three-dimensional imaging technologies, computer algorithms and computer power now allow us to develop and analyse increasingly realistic models of biological control. Here, we present our pipeline for image-based modelling that includes the segmentation of images, the determination of displacement fields and the solution of systems of partial differential equations on the growing, embryonic domains. The development of suitable mathematical models, the data-based inference of parameter sets and the evaluation of competing models are still challenging, and current approaches are discussed.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2016-07-16
    Description: Owing greatly to the advancement of next-generation sequencing (NGS), the amount of NGS data is increasing rapidly. Although there are many NGS applications, one of the most commonly used techniques ‘RNA sequencing (RNA-seq)’ is rapidly replacing microarray-based techniques in laboratories around the world. As more and more of such techniques are standardized, allowing technicians to perform these experiments with minimal hands-on time and reduced experimental/operator-dependent biases, the bottleneck of such techniques is clearly visible; that is, data analysis. Further complicating the matter, increasing evidence suggests most of the genome is transcribed into RNA; however, the majority of these RNAs are not translated into proteins. These RNAs that do not become proteins are called ‘noncoding RNAs (ncRNAs)’. Although some time has passed since the discovery of ncRNAs, their annotations remain poor, making analysis of RNA-seq data challenging. Here, we examine the current limitations of RNA-seq analysis using case studies focused on the detection of novel transcripts and examination of their characteristics. Finally, we validate the presence of novel transcripts using biological experiments, showing novel transcripts can be accurately identified when a series of filters is applied. In conclusion, novel transcripts that are identified from RNA-seq must be examined carefully before proceeding to biological experiments.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2016-07-16
    Description: Functional genomics has enormous potential to facilitate our understanding of normal and disease-specific physiology. In the past decade, intensive research efforts have been focused on modeling functional relationship networks, which summarize the probability of gene co-functionality relationships. Such modeling can be based on either expression data only or heterogeneous data integration. Numerous methods have been deployed to infer the functional relationship networks, while most of them target the global (non-context-specific) functional relationship networks. However, it is expected that functional relationships consistently reprogram under different tissues or biological processes. Thus, advanced methods have been developed targeting tissue-specific or developmental stage-specific networks. This article brings together the state-of-the-art functional relationship network modeling methods, emphasizes the need for heterogeneous genomic data integration and context-specific network modeling and outlines future directions for functional relationship networks.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2016-07-16
    Description: Identification of drug–target interactions is an important process in drug discovery. Although high-throughput screening and other biological assays are becoming available, experimental methods for drug–target interaction identification remain to be extremely costly, time-consuming and challenging even nowadays. Therefore, various computational models have been developed to predict potential drug–target associations on a large scale. In this review, databases and web servers involved in drug–target identification and drug discovery are summarized. In addition, we mainly introduced some state-of-the-art computational models for drug–target interactions prediction, including network-based method, machine learning-based method and so on. Specially, for the machine learning-based method, much attention was paid to supervised and semi-supervised models, which have essential difference in the adoption of negative samples. Although significant improvements for drug–target interaction prediction have been obtained by many effective computational models, both network-based and machine learning-based methods have their disadvantages, respectively. Furthermore, we discuss the future directions of the network-based drug discovery and network approach for personalized drug discovery based on personalized medicine, genome sequencing, tumor clone-based network and cancer hallmark-based network. Finally, we discussed the new evaluation validation framework and the formulation of drug–target interactions prediction problem by more realistic regression formulation based on quantitative bioactivity data.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2016-07-16
    Description: Atherosclerosis is one of the principle pathologies of cardiovascular disease with blood cholesterol a significant risk factor. The World Health Organization estimates that approximately 2.5 million deaths occur annually because of the risk from elevated cholesterol, with 39% of adults worldwide at future risk. Atherosclerosis emerges from the combination of many dynamical factors, including haemodynamics, endothelial damage, innate immunity and sterol biochemistry. Despite its significance to public health, the dynamics that drive atherosclerosis remain poorly understood. As a disease that depends on multiple factors operating on different length scales, the natural framework to apply to atherosclerosis is mathematical and computational modelling. A computational model provides an integrated description of the disease and serves as an in silico experimental system from which we can learn about the disease and develop therapeutic hypotheses. Although the work completed in this area to date has been limited, there are clear signs that interest is growing and that a nascent field is establishing itself. This article discusses the current state of modelling in this area, bringing together many recent results for the first time. We review the work that has been done, discuss its scope and highlight the gaps in our understanding that could yield future opportunities.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2016-07-16
    Description: We present Bioinformatics Autodiscovery of Training Materials (BATMat), an open-source, Google-based, targeted, automatic search tool for training materials related to bioinformatics. BATMat helps gain access with one click to filtered and portable information containing links to existing materials (when present). It also offers functionality to sort results according to source site or title. Availability: http://imbatmat.com Contact: piar301@gmail.com
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2016-07-16
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2016-07-16
    Description: Cancer is often driven by the accumulation of genetic alterations, including single nucleotide variants, small insertions or deletions, gene fusions, copy-number variations, and large chromosomal rearrangements. Recent advances in next-generation sequencing technologies have helped investigators generate massive amounts of cancer genomic data and catalog somatic mutations in both common and rare cancer types. So far, the somatic mutation landscapes and signatures of 〉10 major cancer types have been reported; however, pinpointing driver mutations and cancer genes from millions of available cancer somatic mutations remains a monumental challenge. To tackle this important task, many methods and computational tools have been developed during the past several years and, thus, a review of its advances is urgently needed. Here, we first summarize the main features of these methods and tools for whole-exome, whole-genome and whole-transcriptome sequencing data. Then, we discuss major challenges like tumor intra-heterogeneity, tumor sample saturation and functionality of synonymous mutations in cancer, all of which may result in false-positive discoveries. Finally, we highlight new directions in studying regulatory roles of noncoding somatic mutations and quantitatively measuring circulating tumor DNA in cancer. This review may help investigators find an appropriate tool for detecting potential driver or actionable mutations in rapidly emerging precision cancer medicine.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2016-07-16
    Description: Recent literature has highlighted the advantages of haplotype association methods for detecting rare variants associated with common diseases. As several new haplotype association methods have been proposed in the past few years, a comparison of new and standard methods is important and timely for guidance to the practitioners. We consider nine methods—Haplo.score, Haplo.glm, Hapassoc, Bayesian hierarchical Generalized Linear Model (BhGLM), Logistic Bayesian LASSO (LBL), regularized GLM (rGLM), Haplotype Kernel Association Test, wei-SIMc-matching and Weighted Haplotype and Imputation-based Tests. These can be divided into two types—individual haplotype-specific tests and global tests depending on whether there is just one overall test for a haplotype region (global) or there is an individual test for each haplotype in the region. Haplo.score is the only method that tests for both; Haplo.glm, Hapassoc, BhGLM and LBL are individual haplotype-specific, while the rest are global tests. For comparison, we also apply a popular collapsing method—Sequence Kernel Association Test (SKAT) and its two variants—SKAT-O (Optimal) and SKAT-C (Combined). We carry out an extensive comparison on our simulated data sets as well as on the Genetic Analysis Workshop (GAW) 18 simulated data. Further, we apply the methods to GAW18 real hypertension data and Dallas Heart Study sequence data. We find that LBL, Haplo.score (global test) and rGLM perform well over the scenarios considered here. Also, haplotype methods are more powerful (albeit more computationally intensive) than SKAT and its variants in scenarios where multiple causal variants act interactively to produce haplotype effects.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2016-07-16
    Description: The purpose of this article is to inform readers about technical challenges that we encountered when assembling exome sequencing data from the ‘Simplifying Complex Exomes' (SIMPLEXO) consortium—whose mandate is the discovery of novel genes predisposing to breast and ovarian cancers. Our motivation is to share these obstacles—and our solutions to them—as a means of communicating important technical details that should be discussed early in projects involving massively parallel sequencing.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2016-07-16
    Description: Predictive, preventive, personalized and participatory (P4) medicine is an emerging medical model that is based on the customization of all medical aspects (i.e. practices, drugs, decisions) of the individual patient. P4 medicine presupposes the elucidation of the so-called omic world, under the assumption that this knowledge may explain differences of patients with respect to disease prevention, diagnosis and therapies. Here, we elucidate the role of some selected omics sciences for different aspects of disease management, such as early diagnosis of diseases, prevention of diseases, selection of personalized appropriate and optimal therapies based on molecular profiling of patients. After introducing basic concepts of P4 medicine and omics sciences, we review some computational tools and approaches for analysing selected omics data, with a special focus on microarray and mass spectrometry data, which may be used to support P4 medicine. Some applications of biomarker discovery and pharmacogenomics and some experiences on the study of drug reactions are also described.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2016-07-16
    Description: A wide variety of large-scale data have been produced in bioinformatics. In response, the need for efficient handling of biomedical big data has been partly met by parallel computing. However, the time demand of many bioinformatics programs still remains high for large-scale practical uses because of factors that hinder acceleration by parallelization. Recently, new generations of storage devices have emerged, such as NAND flash-based solid-state drives (SSDs), and with the renewed interest in near-data processing, they are increasingly becoming acceleration methods that can accompany parallel processing. In certain cases, a simple drop-in replacement of hard disk drives by SSDs results in dramatic speedup. Despite the various advantages and continuous cost reduction of SSDs, there has been little review of SSD-based profiling and performance exploration of important but time-consuming bioinformatics programs. For an informative review, we perform in-depth profiling and analysis of 23 key bioinformatics programs using multiple types of devices. Based on the insight we obtain from this research, we further discuss issues related to design and optimize bioinformatics algorithms and pipelines to fully exploit SSDs. The programs we profile cover traditional and emerging areas of importance, such as alignment, assembly, mapping, expression analysis, variant calling and metagenomics. We explain how acceleration by parallelization can be combined with SSDs for improved performance and also how using SSDs can expedite important bioinformatics pipelines, such as variant calling by the Genome Analysis Toolkit and transcriptome analysis using RNA sequencing. We hope that this review can provide useful directions and tips to accompany future bioinformatics algorithm design procedures that properly consider new generations of powerful storage devices.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2016-07-16
    Description: State-of-the-art next-generation sequencing, transcriptomics, proteomics and other high-throughput ‘omics' technologies enable the efficient generation of large experimental data sets. These data may yield unprecedented knowledge about molecular pathways in cells and their role in disease. Dimension reduction approaches have been widely used in exploratory analysis of single omics data sets. This review will focus on dimension reduction approaches for simultaneous exploratory analyses of multiple data sets. These methods extract the linear relationships that best explain the correlated structure across data sets, the variability both within and between variables (or observations) and may highlight data issues such as batch effects or outliers. We explore dimension reduction techniques as one of the emerging approaches for data integration, and how these can be applied to increase our understanding of biological systems in normal physiological function and disease.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2016-09-17
    Description: In the biology of tissue development and diseases, DNA methylation plays an important role. For a deeper understanding, it is crucial to accurately compare DNA methylation patterns between groups of samples representing different conditions. A widely used method to investigate DNA methylation in the CpG context is bisulfite sequencing, which produces data on the single-nucleotide scale. While there are benefits to analyzing CpG sites on a basepair level, there are both biological and statistical reasons to test entire genomic regions for differential methylation. However, the analysis of DNA methylation is hampered by the lack of best practice standards. Here, we compared multiple approaches for testing predefined genomic regions for differential DNA methylation in bisulfite sequencing data. Nine methods were evaluated: BiSeq, COHCAP, Goeman's Global Test, Limma, methylKit/eDMR, RADMeth and three log-linear regression approaches with different distribution assumptions. We applied these methods to simulated data and determined their sensitivity and specificity. This revealed performance differences, which were also seen when applied to real data. Methods that first test single CpG sites and then test regions based on transformed CpG-wise P -values performed better than methods that summarize methylation levels or raw reads. Interestingly, smoothing of methylation levels had a negligible impact. In particular, Global Test, BiSeq and RADMeth/ z -test outperformed the other methods we evaluated, providing valuable guidance for more accurate analysis of DNA methylation.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2016-09-17
    Description: Gene expression measurements represent the most important source of biological data used to unveil the interaction and functionality of genes. In this regard, several data mining and machine learning algorithms have been proposed that require, in a number of cases, some kind of data discretization to perform the inference. Selection of an appropriate discretization process has a major impact on the design and outcome of the inference algorithms, as there are a number of relevant issues that need to be considered. This study presents a revision of the current state-of-the-art discretization techniques, together with the key subjects that need to be considered when designing or selecting a discretization approach for gene expression data.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2016-09-17
    Description: Pathway Tools is a bioinformatics software environment with a broad set of capabilities. The software provides genome-informatics tools such as a genome browser, sequence alignments, a genome-variant analyzer and comparative-genomics operations. It offers metabolic-informatics tools, such as metabolic reconstruction, quantitative metabolic modeling, prediction of reaction atom mappings and metabolic route search. Pathway Tools also provides regulatory-informatics tools, such as the ability to represent and visualize a wide range of regulatory interactions. This article outlines the advances in Pathway Tools in the past 5 years. Major additions include components for metabolic modeling, metabolic route search, computation of atom mappings and estimation of compound Gibbs free energies of formation; addition of editors for signaling pathways, for genome sequences and for cellular architecture; storage of gene essentiality data and phenotype data; display of multiple alignments, and of signaling and electron-transport pathways; and development of Python and web-services application programming interfaces. Scientists around the world have created more than 9800 Pathway/Genome Databases by using Pathway Tools, many of which are curated databases for important model organisms.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2016-09-17
    Description: Recent advances in next-generation sequencing technology have yielded increasing cost-effectiveness and higher throughput produced per run, in turn, greatly influencing the analysis of DNA sequences. Among the various sequencing technologies, Illumina is by far the most widely used platform. However, the Illumina sequencing platform suffers from several imperfections that can be attributed to the chemical processes inherent to the sequencing-by-synthesis technology. With the enormous amounts of reads produced, statistical methodologies and computationally efficient algorithms are required to improve the accuracy and speed of base-calling. Over the past few years, several papers have proposed methods to model the various imperfections, giving rise to accurate and/or efficient base-calling algorithms. In this article, we provide a comprehensive comparison of the performance of recently developed base-callers and we present a general statistical model that unifies a large majority of these base-callers.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2016-09-17
    Description: Phenotypes have gained increased notoriety in the clinical and biological domain owing to their application in numerous areas such as the discovery of disease genes and drug targets, phylogenetics and pharmacogenomics. Phenotypes, defined as observable characteristics of organisms, can be seen as one of the bridges that lead to a translation of experimental findings into clinical applications and thereby support ‘bench to bedside’ efforts. However, to build this translational bridge, a common and universal understanding of phenotypes is required that goes beyond domain-specific definitions. To achieve this ambitious goal, a digital revolution is ongoing that enables the encoding of data in computer-readable formats and the data storage in specialized repositories, ready for integration, enabling translational research. While phenome research is an ongoing endeavor, the true potential hidden in the currently available data still needs to be unlocked, offering exciting opportunities for the forthcoming years. Here, we provide insights into the state-of-the-art in digital phenotyping, by means of representing, acquiring and analyzing phenotype data. In addition, we provide visions of this field for future research work that could enable better applications of phenotype data.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2016-09-17
    Description: Many studies now produce parallel data sets from different omics technologies; however, the task of interpreting the acquired data in an integrated fashion is not trivial. This review covers those methods that have been used over the past decade to statistically integrate and interpret metabolomics and transcriptomic data sets. It defines four categories of approaches, correlation-based integration, concatenation-based integration, multivariate-based integration and pathway-based integration, into which all existing statistical methods fit. It also explores the choices in study design for generating samples for analysis by these omics technologies and the impact that these technical decisions have on the subsequent data analysis options.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2016-09-17
    Description: Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional variants is yet impractical; thus, the prediction of functional and/or regulatory impacts of the various mutations using in silico approaches is an important step toward the identification of functionally significant or clinically actionable variants. The relationships between genotypes and the expressed phenotypes are multilayered and biologically complex; such relationships present numerous challenges and at the same time offer various opportunities for the design of in silico variant assessment strategies. Over the past decade, many bioinformatics algorithms have been developed to predict functional consequences of single nucleotide variants in the protein coding regions. In this review, we provide an overview of the bioinformatics resources for the prediction, annotation and visualization of coding single nucleotide variants. We discuss the currently available approaches and major challenges from the perspective of protein sequence, structure, function and interactions that require consideration when interpreting the impact of putatively functional variants. We also discuss the relevance of incorporating integrated workflows for predicting the biomedical impact of the functionally important variations encoded in a genome, exome or transcriptome. Finally, we propose a framework to classify variant assessment approaches and strategies for incorporation of variant assessment within electronic health records.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2015-05-19
    Description: Network motif detection is the search for statistically overrepresented subgraphs present in a larger target network. They are thought to represent key structure and control mechanisms. Although the problem is exponential in nature, several algorithms and tools have been developed for efficiently detecting network motifs. This work analyzes 11 network motif detection tools and algorithms. Detailed comparisons and insightful directions for using these tools and algorithms are discussed. Key aspects of network motif detection are investigated. Network motif types and common network motifs as well as their biological functions are discussed. Applications of network motifs are also presented. Finally, the challenges, future improvements and future research directions for network motif detection are also discussed.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2015-05-19
    Description: Unlike annuals, all perennial plants undergo seasonal transitions during ontogeny. As an adaptive response to seasonal changes in climate, the seasonal pattern of growth is likely to be under genetic control, although its underlying genetic basis remains unknown. Here, we develop a computational model that can map specific quantitative trait loci (QTLs) responsible for seasonal transitions of growth in perennials. The model is founded on functional mapping, a statistical framework to map developmental dynamics, which is reformed to integrate a seasonally adjusted growth function. The new model is equipped with a capacity to characterize the genetic effects of QTLs on seasonal alternation at different ages and then to better elucidate the genetic architecture of development. The model is implemented with a series of testing procedures, including (i) how a QTL controls an overall ontogenetic growth curve, (ii) how the QTL determines seasonal trajectories of growth within years and (iii) how it determines the dynamic nature of age-specific season response. The model was validated through computer simulation. The extension of season adjustment to other types of biological curves is statistically straightforward, facilitating a wider variety of genetic studies into ontogenetic growth and development in perennial plants.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2015-05-19
    Description: Phylogenetic analysis is used to recover the evolutionary history of species, genes or proteins. Understanding phylogenetic relationships between organisms is a prerequisite of almost any evolutionary study, as contemporary species all share a common history through their ancestry. Moreover, it is important because of its wide applications that include understanding genome organization, epidemiological investigations, predicting protein functions, and deciding the genes to be analyzed in comparative studies. Despite immense progress in recent years, phylogenetic reconstruction involves many challenges that create uncertainty with respect to the true evolutionary relationships of the species or genes analyzed. One of the most notable difficulties is the widespread occurrence of incongruence among methods and also among individual genes or different genomic regions. Presence of widespread incongruence inhibits successful revealing of evolutionary relationships and applications of phylogenetic analysis. In this article, I concisely review the effect of various factors that cause incongruence in molecular phylogenies, the advances in the field that resolved some factors, and explore unresolved factors that cause incongruence along with possible ways for tackling them.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2015-01-15
    Description: As an important mechanism for adaptation to heterogeneous environment, plastic responses of correlated traits to environmental alteration may also be genetically correlated, but less is known about the underlying genetic basis. We describe a statistical model for mapping specific quantitative trait loci (QTLs) that control the interrelationship of phenotypic plasticity between different traits. The model is constructed by a bivariate mixture setting, implemented with the EM algorithm to estimate the genetic effects of QTLs on correlative plastic response. We provide a series of procedure that test (1) how a QTL controls the phenotypic plasticity of a single trait; and (2) how the QTL determines the correlation of environment-induced changes of different traits. The model is readily extended to test how epistatic interactions among QTLs play a part in the correlations of different plastic traits. The model was validated through computer simulation and used to analyse multi-environment data of genetic mapping in winter wheat, showing its utilization in practice.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2015-07-15
    Description: Predictive modelling of gene expression provides a powerful framework for exploring the regulatory logic underpinning transcriptional regulation. Recent studies have demonstrated the utility of such models in identifying dysregulation of gene and miRNA expression associated with abnormal patterns of transcription factor (TF) binding or nucleosomal histone modifications (HMs). Despite the growing popularity of such approaches, a comparative review of the various modelling algorithms and feature extraction methods is lacking. We define and compare three methods of quantifying pairwise gene-TF/HM interactions and discuss their suitability for integrating the heterogeneous chromatin immunoprecipitation (ChIP)-seq binding patterns exhibited by TFs and HMs. We then construct log-linear and -support vector regression models from various mouse embryonic stem cell (mESC) and human lymphoblastoid (GM12878) data sets, considering both ChIP-seq- and position weight matrix- (PWM)-derived in silico TF-binding. The two algorithms are evaluated both in terms of their modelling prediction accuracy and ability to identify the established regulatory roles of individual TFs and HMs. Our results demonstrate that TF-binding and HMs are highly predictive of gene expression as measured by mRNA transcript abundance, irrespective of algorithm or cell type selection and considering both ChIP-seq and PWM-derived TF-binding. As we encourage other researchers to explore and develop these results, our framework is implemented using open-source software and made available as a preconfigured bootable virtual environment.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2015-07-15
    Description: Dysregulation or inhibition of apoptosis favors cancer and many other diseases. Understanding of the network interaction of the genes involved in apoptotic pathway, therefore, is essential, to look for targets of therapeutic intervention. Here we used the network theory methods, using experimentally validated 25 apoptosis regulatory proteins and identified important genes for apoptosis regulation, which demonstrated a hierarchical scale-free fractal protein–protein interaction network. TP53, BRCA1, UBIQ and CASP3 were recognized as a four key regulators. BRCA1 and UBIQ were also individually found to control highly clustered modules and play an important role in the stability of the overall network. The connection among the BRCA1, UBIQ and TP53 proteins was found to be important for regulation, which controlled their own respective communities and the overall network topology. The feedback loop regulation motif was identified among NPM1, BRCA1 and TP53, and these crucial motif topologies were also reflected in high frequency. The propagation of the perturbed signal from hubs was found to be active upto some distance, after which propagation started decreasing and TP53 was the most efficient signal propagator. From the functional enrichment analysis, most of the apoptosis regulatory genes associated with cardiovascular diseases and highly expressed in brain tissues were identified. Apart from TP53, BRCA1 was observed to regulate apoptosis by influencing motif, propagation of signals and module regulation, reflecting their biological significance. In future, biochemical investigation of the observed hub-interacting partners could provide further understanding about their role in the pathophysiology of cancer.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2015-07-15
    Description: With the advent of significant establishment and development of Internet facilities and computational infrastructure, an overview on bio/chemoinformatics is presented along with its multidisciplinary facts, promises and challenges. The Government of India has paved the way for more profound research in biological field with the use of computational facilities and schemes/projects to collaborate with scientists from different disciplines. Simultaneously, the growth of available biomedical data has provided fresh insight into the nature of redundant and compensatory data. Today, bioinformatics research in India is characterized by a powerful grid computing systems, great variety of biological questions addressed and the close collaborations between scientists and clinicians, with a full spectrum of focuses ranging from database building and methods development to biological discoveries. In fact, this outlook provides a resourceful platform highlighting the funding agencies, institutes and industries working in this direction, which would certainly be of great help to students seeking their career in bioinformatics. Thus, in short, this review highlights the current bio/chemoinformatics trend, educations, status, diverse applicability and demands for further development.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2015-07-15
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2015-07-15
    Description: Genotype imputation has been widely adopted in the postgenome-wide association studies (GWAS) era. Owing to its ability to accurately predict the genotypes of untyped variants, imputation greatly boosts variant density, allowing fine-mapping studies of GWAS loci and large-scale meta-analysis across different genotyping arrays. By leveraging genotype data from 90 whole-genome deeply sequenced individuals as the evaluation benchmark and the 1000 Genomes Project data as reference panels, we systematically examined four important issues related to genotype imputation practice. First, in a study of imputation accuracy, we found that IMPUTE2 and minimac have the best imputation performance among the three popular imputing software evaluated and that using a multi-population reference panel is beneficial. Second, the optimal imputation quality cutoff for removing poorly imputed variants varies according to the software used. Third, the major contributing factors to consistently poor imputation are low variant heterozygosity, high sequence similarity to other genomic regions, high GC content, segmental duplication and being far from genotyping markers. Lastly, in an evaluation of the imputability of all known GWAS regions, we found that GWAS loci associated with hematological measurements and immune system diseases are harder to impute, as compared with other human traits. Recommendations made based on the above findings may provide practical guidance for imputation exercise in future genetic studies.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2015-07-15
    Description: Next-generation sequencing technologies revolutionized the ways in which genetic information is obtained and have opened the door for many essential applications in biomedical sciences. Hundreds of gigabytes of data are being produced, and all applications are affected by the errors in the data. Many programs have been designed to correct these errors, most of them targeting the data produced by the dominant technology of Illumina. We present a thorough comparison of these programs. Both HiSeq and MiSeq types of Illumina data are analyzed, and correcting performance is evaluated as the gain in depth and breadth of coverage, as given by correct reads and k-mers. Time and memory requirements, scalability and parallelism are considered as well. Practical guidelines are provided for the effective use of these tools. We also evaluate the efficiency of the current state-of-the-art programs for correcting Illumina data and provide research directions for further improvement.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2015-07-15
    Description: Protein–protein interaction is of primary importance to understand protein functions. In recent years, the high-throughput AP-MS experiments have generated a large amount of bait–prey data, posing great challenges on the computational analysis of such data for inferring true interactions and protein complexes. To date, many research efforts have been devoted to developing novel computational methods to analyze these AP-MS data sets. In this article, we review and classify the key computational methods developed for the inference of protein–protein interactions and the detection of protein complexes from the AP-MS experiments. We hope that our review as well as the challenges highlighted in the article will provide valuable insights into driving future research for further advancing the state-of-the-art technologies in computational prediction, characterization and analysis of protein–protein interactions and protein complexes from the AP-MS data.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2015-07-15
    Description: Advancements in high-throughput nucleotide sequencing techniques have brought with them state-of-the-art bioinformatics programs and software packages. Given the importance of molecular sequence data in contemporary life science research, these software suites are becoming an essential component of many labs and classrooms, and as such are frequently designed for non-computer specialists and marketed as one-stop bioinformatics toolkits. Although beautifully designed and powerful, user-friendly bioinformatics packages can be expensive and, as more arrive on the market each year, it can be difficult for researchers, teachers and students to choose the right software for their needs, especially if they do not have a bioinformatics background. This review highlights some of the currently available and most popular commercial bioinformatics packages, discussing their prices, usability, features and suitability for teaching. Although several commercial bioinformatics programs are arguably overpriced and overhyped, many are well designed, sophisticated and, in my opinion, worth the investment. If you are just beginning your foray into molecular sequence analysis or an experienced genomicist, I encourage you to explore proprietary software bundles. They have the potential to streamline your research, increase your productivity, energize your classroom and, if anything, add a bit of zest to the often dry detached world of bioinformatics.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2015-07-15
    Description: A number of bioinformatic or biostatistical methods are available for analyzing DNA copy number profiles measured from microarray or sequencing technologies. In the absence of rich enough gold standard data sets, the performance of these methods is generally assessed using unrealistic simulation studies, or based on small real data analyses. To make an objective and reproducible performance assessment, we have designed and implemented a framework to generate realistic DNA copy number profiles of cancer samples with known truth. These profiles are generated by resampling publicly available SNP microarray data from genomic regions with known copy-number state. The original data have been extracted from dilutions series of tumor cell lines with matched blood samples at several concentrations. Therefore, the signal-to-noise ratio of the generated profiles can be controlled through the (known) percentage of tumor cells in the sample. This article describes this framework and its application to a comparison study between methods for segmenting DNA copy number profiles from SNP microarrays. This study indicates that no single method is uniformly better than all others. It also helps identifying pros and cons of the compared methods as a function of biologically informative parameters, such as the fraction of tumor cells in the sample and the proportion of heterozygous markers. This comparison study may be reproduced using the open source and cross-platform R package jointseg, which implements the proposed data generation and evaluation framework: http://r-forge.r-project.org/R/?group_id=1562 .
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2015-07-15
    Description: Protein ubiquitination is one of the most important reversible post-translational modifications (PTMs). In many biochemical, pathological and pharmaceutical studies on understanding the function of proteins in biological processes, identification of ubiquitination sites is an important first step. However, experimental approaches for identifying ubiquitination sites are often expensive, labor-intensive and time-consuming, partly due to the dynamics and reversibility of ubiquitination. In silico prediction of ubiquitination sites is potentially a useful strategy for whole proteome annotation. A number of bioinformatics approaches and tools have recently been developed for predicting protein ubiquitination sites. However, these tools have different methodologies, prediction algorithms, functionality and features, which complicate their utility and application. The purpose of this review is to aid users in selecting appropriate tools for specific analyses and circumstances. We first compared five popular webservers and standalone software options, assessing their performance on four up-to-date ubiquitination benchmark datasets from Saccharomyces cerevisiae , Homo sapiens , Mus musculus and Arabidopsis thaliana . We then discussed and summarized these tools to guide users in choosing among the tools efficiently and rapidly. Finally, we assessed the importance of features of existing tools for ubiquitination site prediction, ranking them by performance. We also discussed the features that make noticeable contributions to species-specific ubiquitination site prediction.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2015-07-15
    Description: Nucleosomes contribute to compacting the genome into the nucleus and regulate the physical access of regulatory proteins to DNA either directly or through the epigenetic modifications of the histone tails. Precise mapping of nucleosome positioning across the genome is, therefore, essential to understanding the genome regulation. In recent years, several experimental protocols have been developed for this purpose that include the enzymatic digestion, chemical cleavage or immunoprecipitation of chromatin followed by next-generation sequencing of the resulting DNA fragments. Here, we compare the performance and resolution of these methods from the initial biochemical steps through the alignment of the millions of short-sequence reads to a reference genome to the final computational analysis to generate genome-wide maps of nucleosome occupancy. Because of the lack of a unified protocol to process data sets obtained through the different approaches, we have developed a new computational tool (NUCwave), which facilitates their analysis, comparison and assessment and will enable researchers to choose the most suitable method for any particular purpose. NUCwave is freely available at http://nucleosome.usal.es/nucwave along with a step-by-step protocol for its use.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2015-07-15
    Description: It is common and advised practice in biomedical research to validate experimental or observational findings in a population different from the one where the findings were initially assessed. This practice increases the generalizability of the results and decreases the likelihood of reporting false-positive findings. Validation becomes critical when dealing with high-throughput experiments, where the large number of tests increases the chance to observe false-positive results. In this article, we review common approaches to determine statistical thresholds for validation and describe the factors influencing the proportion of significant findings from a ‘training’ sample that are replicated in a ‘validation’ sample. We refer to this proportion as rediscovery rate (RDR). In high-throughput studies, the RDR is a function of false-positive rate and power in both the training and validation samples. We illustrate the application of the RDR using simulated data and real data examples from metabolomics experiments. We further describe an online tool to calculate the RDR using t-statistics. We foresee two main applications. First, if the validation study has not yet been collected, the RDR can be used to decide the optimal combination between the proportion of findings taken to validation and the size of the validation study. Secondly, if a validation study has already been done, the RDR estimated using the training data can be compared with the observed RDR from the validation data; hence, the success of the validation study can be assessed.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2015-07-15
    Description: Recent advances in RNA library preparation methods, platform accessibility and cost efficiency have allowed high-throughput RNA sequencing (RNAseq) to replace conventional hybridization microarray platforms as the method of choice for mRNA profiling and transcriptome analyses. RNAseq is a powerful technique to profile both long and short RNA expression, and the depth of information gained from distinct RNAseq methods is striking and facilitates discovery. In addition to expression analysis, distinct RNAseq approaches also allow investigators the ability to assess transcriptional elongation, DNA variance and exogenous RNA content. Here we review the current state of the art in transcriptome sequencing and address epigenetic regulation, quantification of transcription activation, RNAseq output and a diverse set of applications for RNAseq data. We detail how RNAseq can be used to identify allele-specific expression, single-nucleotide polymorphisms and somatic mutations and discuss the benefits and limitations of using RNAseq to monitor DNA characteristics. Moreover, we highlight the power of combining RNA- and DNAseq methods for genomic analysis. In summary, RNAseq provides the opportunity to gain greater insight into transcriptional regulation and output than simply miRNA and mRNA profiling.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2015-09-16
    Description: Non-synonymous single nucleotide variants (nsSNVs) in coding DNA regions can result in phenotypic differences between individuals; however, only some nsSNVs are causative for a certain disease. As just a fraction of respective nsSNVs is annotated in databases, computational biology tools are applied to predict the pathogenicity in silico . In addition to applications in oncology, novel molecular diagnostic tests have been developed for cardiovascular disorders as a leading cause of morbidity and mortality in industrialized nations. We explored the concordance and performance of 13 nsSNV pathogenicity prediction tools on panel sequencing results of dilated cardiomyopathy. The analyzed data set from the INHERITANCE study contained 842 nsSNVs discovered in 639 patients, screened for the full sequence of 76 genes related to cardiomyopathies. The single tools prediction revealed a surprisingly high heterogeneity and discordance based on the implemented prediction method. Known disease associations were not reported by the tools, limiting usability in clinics. Because different tools have different advantages, we combined their results. By clustering of correlated methods using similar prediction strategies and calculating a majority vote-based consensus, we found that the prediction accuracy and sensitivity can be further improved. Although challenges remain, different in silico tools bear the potential to predict the malignancy of nsSNVs, especially if different algorithms are combined. Most tools rely mainly on sequence features; beyond these, structural information is important to analyze the relationship of nsSNVs with disease phenotypes. Likewise, current tools consider single nsSNVs, which may, however, show a cumulative effect and turn neutral mutations in an ensemble into pathogenic variants.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2015-09-16
    Description: DNA-based taxonomic and functional profiling is widely used for the characterization of organismal communities across a rapidly increasing array of research areas that include the role of microbiomes in health and disease, biomonitoring, and estimation of both microbial and metazoan species richness. Two principal approaches are currently used to assign taxonomy to DNA sequences: DNA metabarcoding and metagenomics. When initially developed, each of these approaches mandated their own particular methods for data analysis; however, with the development of high-throughput sequencing (HTS) techniques they have begun to share many aspects in data set generation and processing. In this review we aim to define the current characteristics, goals and boundaries of each field, and describe the different software used for their analysis. We argue that an appreciation of the potential and limitations of each method can help underscore the improvements required by each field so as to better exploit the richness of current HTS-based data sets.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2015-09-16
    Description: Long noncoding RNAs (lncRNAs) represent a big category of noncoding RNA molecules, and increasing studies have shown that they play important roles in various critical biological processes. They show a diversity of functions through diverse mechanisms, among which regulating RNA molecules is one of the most popular ones. Given the big number of lncRNAs, it becomes urgent and important to predict the RNA targets of lncRNAs in a large scale for the comprehensive understanding of lncRNA functions and action mechanisms. Although several methods have been developed to predict RNA–RNA interactions, none of them can be used to predict the RNA targets of lncRNAs in a large scale. Here we presented a tool, LncTar, which shows the ability to efficiently predict the RNA targets of lncRNAs in a large scale. To test the accuracy of LncTar, we applied it to 10 experimentally supported lncRNA–mRNA interactions. As a result, LncTar successfully predicted 8 (80%) of the 10 lncRNA–mRNA pairs, suggesting that LncTar has a reliable accuracy. Finally, we believe that LncTar could be an efficient tool for the fast identification of the RNA targets of lncRNAs. LncTar is freely available at http://www.cuilab.cn/lnctar .
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2015-09-16
    Description: In recent years, a myriad of new statistical methods have been proposed for detecting associations of rare single-nucleotide variants (SNVs) with common diseases. These methods can be generally classified as ‘collapsing’ or ‘haplotyping’ based. The former is the predominant class, composed of most of the rare variant association methods proposed to date. However, recent works have suggested that haplotyping-based methods may offer advantages and can even be more powerful than collapsing methods in certain situations. In this article, we review and compare collapsing- versus haplotyping-based methods/software in terms of both power and type I error. For collapsing methods, we consider three approaches: Combined Multivariate and Collapsing, Sequence Kernel Association Test and Family-Based Association Test (FBAT): the first two are population based and are among the most popular; the last test is family based, a modification from the popular FBAT to accommodate rare SNVs. For haplotyping-based methods, we include Logistic Bayesian Lasso (LBL) for population data and family-based LBL (famLBL) for family (trio) data. These two methods are selected, as they can be used to test association for specific rare and common haplotypes. Our results show that haplotype methods can be more powerful than collapsing methods if there are interacting SNVs leading to larger haplotype effects. Even if only common SNVs are genotyped, haplotype methods can still detect specific rare haplotypes that tag rare causal SNVs. As expected, family-based methods are robust, whereas population-based methods are susceptible, to population substructure. However, the population-based haplotype approach appears to have smaller inflation of type I error than its collapsing counterparts.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2015-09-16
    Description: The computational or in silico approaches for analysing the HIV-1-human protein–protein interaction (PPI) network, predicting different host cellular factors and PPIs and discovering several pathways are gaining popularity in the field of HIV research. Although there exist quite a few studies in this regard, no previous effort has been made to review these works in a comprehensive manner. Here we review the computational approaches that are devoted to the analysis and prediction of HIV-1-human PPIs. We have broadly categorized these studies into two fields: computational analysis of HIV-1-human PPI network and prediction of novel PPIs. We have also presented a comparative assessment of these studies and proposed some methodologies for discussing the implication of their results. We have also reviewed different computational techniques for predicting HIV-1-human PPIs and provided a comparative study of their applicability. We believe that our effort will provide helpful insights to the HIV research community.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2015-09-16
    Description: From prokaryotes to eukaryotes, phenotypic variation, adaptation and speciation has been associated with structural variation between genomes of individuals within the same species. Many computer algorithms detecting such variations ( callers ) have recently been developed, spurred by the advent of the next-generation sequencing technology. Such callers mainly exploit split-read mapping or paired-end read mapping. However, as different callers are geared towards different types of structural variation, there is still no single caller that can be considered a community standard; instead, increasingly the various callers are combined in integrated pipelines. In this article, we review a wide range of callers, discuss challenges in the integration step and present a survey of pipelines used in population genomics studies. Based on our findings, we provide general recommendations on how to set-up such pipelines. Finally, we present an outlook on future challenges in structural variation detection.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2015-09-16
    Description: MicroRNAs (miRNAs) are short endogenous noncoding RNAs that bind to target mRNAs, usually resulting in degradation and translational repression. Identification of miRNA targets is crucial for deciphering functional roles of the numerous miRNAs that are rapidly generated by sequencing efforts. Computational prediction methods are widely used for high-throughput generation of putative miRNA targets. We review a comprehensive collection of 38 miRNA sequence-based computational target predictors in animals that were developed over the past decade. Our in-depth analysis considers all significant perspectives including the underlying predictive methodologies with focus on how they draw from the mechanistic basis of the miRNA–mRNA interaction. We also discuss ease of use, availability, impact of the considered predictors and the evaluation protocols that were used to assess them. We are the first to comparatively and comprehensively evaluate seven representative methods when predicting miRNA targets at the duplex and gene levels. The gene-level evaluation is based on three benchmark data sets that rely on different ways to annotate targets including biochemical assays, microarrays and pSILAC. We offer practical advice on selection of appropriate predictors according to certain properties of miRNA sequences, characteristics of a specific application and desired levels of predictive quality. We also discuss future work related to the design of new models, data quality, improved usability, need for standardized evaluation and ability to predict mRNA expression changes.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2015-09-16
    Description: ‘Reproducible research’ has received increasing attention over the past few years as bioinformatics and computational biology methodologies become more complex. Although reproducible research is progressing in several valuable ways, we suggest that recent increases in internet bandwidth and disk space, along with the availability of open-source and free-software licences for tools, enable another simple step to make research reproducible. In this article, we urge the creation of minimal virtual reference environments implementing all the tools necessary to reproduce a result, as a standard part of publication. We address potential problems with this approach, and show an example environment from our own work.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2015-09-16
    Description: A drastic amount of data have been and are being generated in bioinformatics studies. In the analysis of such data, the standard modeling approaches can be challenged by the heavy-tailed errors and outliers in response variables, the contamination in predictors (which may be caused by, for instance, technical problems in microarray gene expression studies), model mis-specification and others. Robust methods are needed to tackle these challenges. When there are a large number of predictors, variable selection can be as important as estimation. As a generic variable selection and regularization tool, penalization has been extensively adopted. In this article, we provide a selective review of robust penalized variable selection approaches especially designed for high-dimensional data from bioinformatics and biomedical studies. We discuss the robust loss functions, penalty functions and computational algorithms. The theoretical properties and implementation are also briefly examined. Application examples of the robust penalization approaches in representative bioinformatics and biomedical studies are also illustrated.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2015-09-16
    Description: The majority of scientific resources are devoted to studying a relatively small number of model species, meaning that the ability to translate knowledge across species is of considerable importance. Obtaining species-specific knowledge enables targeted investigations of the biology and pathobiology of a particular species, and facilitates comparative analyses. Phosphorylation is the most widespread posttranslational modification in eukaryotes, and although many phosphorylation sites have been experimentally identified for some species, little or no data are available for others. Using the honeybee as a test organism, this case study illustrates the process of using protein sequence homology to identify putative phosphorylation sites in a species of interest using experimentally determined sites from other species. A number of issues associated with this process are examined and discussed. Several databases of experimentally determined phosphorylation sites exist; however, it can be difficult for the nonspecialist to ascertain how their contents compare. Thus, this case study assesses the content and comparability of several phosphorylation site databases. Additional issues examined include the efficacy of homology-based phosphorylation site prediction, the impact of the level of evolutionary relatedness between species in making these predictions, the ability to translate knowledge of phosphorylation sites across large evolutionary distances and the criteria that should be used in selecting probable phosphorylation sites in the species of interest. Although focusing on phosphorylation, the issues discussed here also apply to the homology-based cross-species prediction of other posttranslational modifications, as well as to sequence motifs in general.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2015-09-16
    Description: The number of samples needed to identify significant effects is a key question in biomedical studies, with consequences on experimental designs, costs and potential discoveries. In metabolic phenotyping studies, sample size determination remains a complex step. This is due particularly to the multiple hypothesis-testing framework and the top-down hypothesis-free approach, with no a priori known metabolic target. Until now, there was no standard procedure available to address this purpose. In this review, we discuss sample size estimation procedures for metabolic phenotyping studies. We release an automated implementation of the Data-driven Sample size Determination (DSD) algorithm for MATLAB and GNU Octave. Original research concerning DSD was published elsewhere. DSD allows the determination of an optimized sample size in metabolic phenotyping studies. The procedure uses analytical data only from a small pilot cohort to generate an expanded data set. The statistical recoupling of variables procedure is used to identify metabolic variables, and their intensity distributions are estimated by Kernel smoothing or log-normal density fitting. Statistically significant metabolic variations are evaluated using the Benjamini–Yekutieli correction and processed for data sets of various sizes. Optimal sample size determination is achieved in a context of biomarker discovery (at least one statistically significant variation) or metabolic exploration (a maximum of statistically significant variations). DSD toolbox is encoded in MATLAB R2008A (Mathworks, Natick, MA) for Kernel and log-normal estimates, and in GNU Octave for log-normal estimates (Kernel density estimates are not robust enough in GNU octave). It is available at http://www.prabi.fr/redmine/projects/dsd/repository , with a tutorial at http://www.prabi.fr/redmine/projects/dsd/wiki .
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2015-09-16
    Description: Discriminative pattern mining is one of the most important techniques in data mining. This challenging task is concerned with finding a set of patterns that occur with disproportionate frequency in data sets with various class labels. Such patterns are of great value for group difference detection and classifier construction. Research on finding interesting discriminative patterns in class-labeled data evolves rapidly and lots of algorithms have been proposed to specifically address this problem. Discriminative pattern mining techniques have proven their considerable value in biological data analysis. The archetypical applications in bioinformatics include phosphorylation motif discovery, differentially expressed gene identification, discriminative genotype pattern detection, etc. In this article, we present an overview of discriminative pattern mining and the corresponding effective methods, and subsequently we illustrate their applications to tackling the bioinformatics problems. In the end, we give a general discussion of potential challenges and future work for this task.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2015-09-16
    Description: The revolution in high-throughput sequencing technologies has enabled the acquisition of gigabytes of RNA sequences in many different conditions and has highlighted an unexpected number of small RNAs (sRNAs) in bacteria. Ongoing exploitation of these data enables numerous applications for investigating bacterial transacting sRNA-mediated regulation networks. Focusing on sRNAs that regulate mRNA translation in trans, recent works have noted several sRNA-based regulatory pathways that are essential for key cellular processes. Although the number of known bacterial sRNAs is increasing, the experimental validation of their interactions with mRNA targets remains challenging and involves expensive and time-consuming experimental strategies. Hence, bioinformatics is crucial for selecting and prioritizing candidates before designing any experimental work. However, current software for target prediction produces a prohibitive number of candidates because of the lack of biological knowledge regarding the rules governing sRNA–mRNA interactions. Therefore, there is a real need to develop new approaches to help biologists focus on the most promising predicted sRNA–mRNA interactions. In this perspective, this review aims at presenting the advantages of mixing bioinformatics and visualization approaches for analyzing predicted sRNA-mediated regulatory bacterial networks.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2015-09-16
    Description: Transport systems comprise roughly 10% of all proteins in a cell, playing critical roles in many processes. Improving and expanding their classification is an important goal that can affect studies ranging from comparative genomics to potential drug target searches. It is not surprising that different classification systems for transport proteins have arisen, be it within a specialized database, focused on this functional class of proteins, or as part of a broader classification system for all proteins. Two such databases are the Transporter Classification Database (TCDB) and the Protein family (Pfam) database. As part of a long-term endeavor to improve consistency between the two classification systems, we have compared transporter annotations in the two databases to understand the rationale for differences and to improve both systems. Differences sometimes reflect the fact that one database has a particular transporter family while the other does not. Differing family definitions and hierarchical organizations were reconciled, resulting in recognition of 69 Pfam ‘Domains of Unknown Function’, which proved to be transport protein families to be renamed using TCDB annotations. Of over 400 potential new Pfam families identified from TCDB, 10% have already been added to Pfam, and TCDB has created 60 new entries based on Pfam data. This work, for the first time, reveals the benefits of comprehensive database comparisons and explains the differences between Pfam and TCDB.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    facet.materialart.
    Unknown
    Oxford University Press
    Publication Date: 2016-01-21
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2016-01-21
    Description: Motivated by the pressing need to characterize protein–DNA and protein–RNA interactions on large scale, we review a comprehensive set of 30 computational methods for high-throughput prediction of RNA- or DNA-binding residues from protein sequences. We summarize these predictors from several significant perspectives including their design, outputs and availability. We perform empirical assessment of methods that offer web servers using a new benchmark data set characterized by a more complete annotation that includes binding residues transferred from the same or similar proteins. We show that predictors of DNA-binding (RNA-binding) residues offer relatively strong predictive performance but they are unable to properly separate DNA- from RNA-binding residues. We design and empirically assess several types of consensuses and demonstrate that machine learning (ML)-based approaches provide improved predictive performance when compared with the individual predictors of DNA-binding residues or RNA-binding residues. We also formulate and execute first-of-its-kind study that targets combined prediction of DNA- and RNA-binding residues. We design and test three types of consensuses for this prediction and conclude that this novel approach that relies on ML design provides better predictive quality than individual predictors when tested on prediction of DNA- and RNA-binding residues individually. It also substantially improves discrimination between these two types of nucleic acids. Our results suggest that development of a new generation of predictors would benefit from using training data sets that combine both RNA- and DNA-binding proteins, designing new inputs that specifically target either DNA- or RNA-binding residues and pursuing combined prediction of DNA- and RNA-binding residues.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2016-01-21
    Description: The impact of a single genetic locus on multiple phenotypes, or pleiotropy, is an important area of research. Biological systems are dynamic complex networks, and these networks exist within and between cells. In humans, the consideration of multiple phenotypes such as physiological traits, clinical outcomes and drug response, in the context of genetic variation, can provide ways of developing a more complete understanding of the complex relationships between genetic architecture and how biological systems function in health and disease. In this article, we describe recent studies exploring the relationships between genetic loci and more than one phenotype. We also cover methodological developments incorporating pleiotropy applied to model organisms as well as humans, and discuss how stepping beyond the analysis of a single phenotype leads to a deeper understanding of complex genetic architecture.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2016-01-21
    Description: The majority of biological processes are mediated via protein–protein interactions. Determination of residues participating in such interactions improves our understanding of molecular mechanisms and facilitates the development of therapeutics. Experimental approaches to identifying interacting residues, such as mutagenesis, are costly and time-consuming and thus, computational methods for this purpose could streamline conventional pipelines. Here we review the field of computational protein interface prediction. We make a distinction between methods which address proteins in general and those targeted at antibodies, owing to the radically different binding mechanism of antibodies. We organize the multitude of currently available methods hierarchically based on required input and prediction principles to provide an overview of the field.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2016-01-21
    Description: Computational drug repositioning or repurposing is a promising and efficient tool for discovering new uses from existing drugs and holds the great potential for precision medicine in the age of big data. The explosive growth of large-scale genomic and phenotypic data, as well as data of small molecular compounds with granted regulatory approval, is enabling new developments for computational repositioning. To achieve the shortest path toward new drug indications, advanced data processing and analysis strategies are critical for making sense of these heterogeneous molecular measurements. In this review, we show recent advancements in the critical areas of computational drug repositioning from multiple aspects. First, we summarize available data sources and the corresponding computational repositioning strategies. Second, we characterize the commonly used computational techniques. Third, we discuss validation strategies for repositioning studies, including both computational and experimental methods. Finally, we highlight potential opportunities and use-cases, including a few target areas such as cancers. We conclude with a brief discussion of the remaining challenges in computational drug repositioning.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2015-11-20
    Description: Ontologies are widely used in biological and biomedical research. Their success lies in their combination of four main features present in almost all ontologies: provision of standard identifiers for classes and relations that represent the phenomena within a domain; provision of a vocabulary for a domain; provision of metadata that describes the intended meaning of the classes and relations in ontologies; and the provision of machine-readable axioms and definitions that enable computational access to some aspects of the meaning of classes and relations. While each of these features enables applications that facilitate data integration, data access and analysis, a great potential lies in the possibility of combining these four features to support integrative analysis and interpretation of multimodal data. Here, we provide a functional perspective on ontologies in biology and biomedicine, focusing on what ontologies can do and describing how they can be used in support of integrative research. We also outline perspectives for using ontologies in data-driven science, in particular their application in structured data mining and machine learning applications.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2015-11-20
    Description: There is a growing interest in the mechanisms and the prediction of how flexible peptides bind proteins, often in a highly selective and conserved manner. While both existing small-molecule docking methods and custom protocols can be used, even short peptides make difficult targets owing to their high torsional flexibility. Any benchmarking should therefore start with those. We compiled a meta-data set of 47 complexes with peptides up to five residues, based on 11 related studies from the past decade. Although their highly varying strategies and constraints preclude direct, quantitative comparisons, we still provide a comprehensive overview of the reported results, using a simple yet stringent measure: the quality of the top-scoring peptide pose. Using the entire data set, this is augmented by our own benchmark of AutoDock Vina, a freely available, fast and widely used docking tool. It particularly addresses non-expert users and was therefore implemented in a highly integrated manner. Guidelines addressing important issues such as the amount of sampling required for result reproducibility are so far lacking. Using peptide docking as an example, this is the first study to address these issues in detail. Finally, to encourage further, standardized benchmarking efforts, the compiled data set is made available in an accessible, transparent and extendable manner.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2015-11-20
    Description: The interaction between T-cell receptors (TCRs) and major histocompatibility complex (MHC)-bound epitopes is one of the most important processes in the adaptive human immune response. Several hypotheses on TCR triggering have been proposed. Many of them involve structural and dynamical adjustments in the TCR/peptide/MHC interface. Molecular Dynamics (MD) simulations are a computational technique that is used to investigate structural dynamics at atomic resolution. Such simulations are used to improve understanding of signalling on a structural level. Here we review how MD simulations of the TCR/peptide/MHC complex have given insight into immune system reactions not achievable with current experimental methods. Firstly, we summarize methods of TCR/peptide/MHC complex modelling and TCR/peptide/MHC MD trajectory analysis methods. Then we classify recently published simulations into categories and give an overview of approaches and results. We show that current studies do not come to the same conclusions about TCR/peptide/MHC interactions. This discrepancy might be caused by too small sample sizes or intrinsic differences between each interaction process. As computational power increases future studies will be able to and should have larger sample sizes, longer runtimes and additional parts of the immunological synapse included.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2015-11-20
    Description: Genome-scale metabolic networks have been reconstructed for several organisms. These metabolic networks provide detailed information about the metabolism inside the cells, coupled with the genomic, proteomic and thermodynamic information. These networks are widely simulated using ‘constraint-based’ modelling techniques and find applications ranging from strain improvement for metabolic engineering to prediction of drug targets in pathogenic organisms. Components of these metabolic networks are represented in multiple file formats and also using different markup languages, with varying levels of annotations; this leads to inconsistencies and increases the complexities in comparing and analysing reconstructions on multiple platforms. In this work, we critically examine nearly 100 published genome-scale metabolic networks and their corresponding constraint-based models and discuss various issues with respect to model quality. One of the major concerns is the lack of annotations using standard identifiers that can uniquely describe several components such as metabolites, genes, proteins and reactions. We also find that many models do not have complete information regarding constraints on reactions fluxes and objective functions for carrying out simulations. Overall, our analysis highlights the need for a widely acceptable standard for representing constraint-based models. A rigorous standard can help in streamlining the process of reconstruction and improve the quality of reconstructed metabolic models.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2015-11-20
    Description: Systems biology, which can be defined as integrative biology, comprises multistage processes that can be used to understand components of complex biological systems of living organisms and provides hierarchical information to decoding life. Using systems biology approaches such as genomics, transcriptomics and proteomics, it is now possible to delineate more complicated interactions between circadian control systems and diseases. The circadian rhythm is a multiscale phenomenon existing within the body that influences numerous physiological activities such as changes in gene expression, protein turnover, metabolism and human behavior. In this review, we describe the relationships between the circadian control system and its related genes or proteins, and circadian rhythm disorders in systems biology studies. To maintain and modulate circadian oscillation, cells possess elaborative feedback loops composed of circadian core proteins that regulate the expression of other genes through their transcriptional activities. The disruption of these rhythms has been reported to be associated with diseases such as arrhythmia, obesity, insulin resistance, carcinogenesis and disruptions in natural oscillations in the control of cell growth. This review demonstrates that lifestyle is considered as a fundamental factor that modifies circadian rhythm, and the development of dysfunctions and diseases could be regulated by an underlying expression network with multiple circadian-associated signals.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2015-11-20
    Description: It has been more than a decade since the completion of the Human Genome Project that provided us with a complete list of human proteins. The next obvious task is to figure out how various parts interact with each other. On that account, we review 10 methods for protein interface prediction, which are freely available as web servers. In addition, we comparatively evaluate their performance on a common data set comprising different quality target structures. We find that using experimental structures and high-quality homology models, structure-based methods outperform those using only protein sequences, with global template-based approaches providing the best performance. For moderate-quality models, sequence-based methods often perform better than those structure-based techniques that rely on fine atomic details. We note that post-processing protocols implemented in several methods quantitatively improve the results only for experimental structures, suggesting that these procedures should be tuned up for computer-generated models. Finally, we anticipate that advanced meta-prediction protocols are likely to enhance interface residue prediction. Notwithstanding further improvements, easily accessible web servers already provide the scientific community with convenient resources for the identification of protein–protein interaction sites.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2015-11-20
    Description: We are in the era of abundant ‘big’ or ‘high-dimensional’ data. These data afford us the opportunity to discover predictors of an event of interest, and to estimate occurrence of the event based on values of these predictors. For example, ‘genome-wide association studies’ examine millions of single-nucleotide polymorphisms (SNPs), along with disease status. We can learn SNPs that affect disease status from these data sets, and use the knowledge learned to predict disease likelihood. Owing to the large number of features, it is difficult for many prediction methods to use all the features directly. The ReliefF algorithm ranks a set of features in terms of how well they predict a target. It can be used to identify good predictors, which can then be provided to a prediction method. We compared the performance of eight prediction methods when predicting binary outcomes using high-dimensional discrete data sets. We performed two-stage prediction, where ReliefF is used in the first stage to identify good predictors. Bayesian network (BN)-based methods performed best overall. Furthermore, ReliefF did not improve their performance. The BN-based methods use the Bayesian Dirichlet Equivalent Uniform score to evaluate candidate models, and use BN inference algorithms to perform prediction. This score and these algorithms were developed for discrete variables. This perhaps explains why they perform better in this domain. Many prediction methods are available, and researchers have little reason for choosing one over the other in the domain of binary prediction using high-dimensional data sets. Our results indicate that the best choices overall are BN-based methods.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2015-11-20
    Description: Sequencing-based gene expression methods like RNA-sequencing (RNA-seq) have become increasingly common, but it is often claimed that results obtained in different studies are not comparable owing to the influence of laboratory batch effects, differences in RNA extraction and sequencing library preparation methods and bioinformatics processing pipelines. It would be unfortunate if different experiments were in fact incomparable, as there is great promise in data fusion and meta-analysis applied to sequencing data sets. We therefore compared reported gene expression measurements for ostensibly similar samples (specifically, human brain, heart and kidney samples) in several different RNA-seq studies to assess their overall consistency and to examine the factors contributing most to systematic differences. The same comparisons were also performed after preprocessing all data in a consistent way, eliminating potential bias from bioinformatics pipelines. We conclude that published human tissue RNA-seq expression measurements appear relatively consistent in the sense that samples cluster by tissue rather than laboratory of origin given simple preprocessing transformations. The article is supplemented by a detailed walkthrough with embedded R code and figures.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2015-11-20
    Description: Cancer results from dysregulation of multiple steps of gene expression programs. We review how transcriptome profiling has been widely explored for cancer classification and biomarker discovery but resulted in limited clinical impact. Therefore, we discuss alternative and complementary omics approaches.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2016-01-21
    Description: Modern technologies are capable of generating enormous amounts of data that measure complex biological systems. Computational biologists and bioinformatics scientists are increasingly being asked to use these data to reveal key systems-level properties. We review the extent to which curricula are changing in the era of big data. We identify key competencies that scientists dealing with big data are expected to possess across fields, and we use this information to propose courses to meet these growing needs. While bioinformatics programs have traditionally trained students in data-intensive science, we identify areas of particular biological, computational and statistical emphasis important for this era that can be incorporated into existing curricula. For each area, we propose a course structured around these topics, which can be adapted in whole or in parts into existing curricula. In summary, specific challenges associated with big data provide an important opportunity to update existing curricula, but we do not foresee a wholesale redesign of bioinformatics training programs.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2016-01-21
    Description: Structural variation (SV) plays an important role in genetic diversity among the population in general and specifically in diseases such as cancer. Modern next-generation sequencing (NGS) technologies provide paired-end sequencing data at high depth with increasing read lengths. This development enabled the analysis of split-reads to detect SV breakpoints with single-nucleotide resolution. But ambiguous mappings and breakpoint sequences with further co-occurring mutations hamper split-read alignments against a reference sequence. The trade-off between high sensitivity and low false-positive rate is problematic and often requires a lot of fine-tuning of the analysis method based on knowledge about its algorithm and the characteristics of the data set. We present SoftSV, a method for exact breakpoint detection for small and large deletions, inversions, tandem duplications and inter-chromosomal translocations, which relies solely on the mutual alignment of soft-clipped reads within the neighborhood of discordantly mapped paired-end reads. Unlike other SV detection algorithms, our approach does not require thresholds regarding sequencing coverage or mapping quality. We evaluate SoftSV together with eight approaches (Breakdancer, Clever, CREST, Delly, GASVPro, Pindel, Socrates and SoftSearch) on simulated and real data sets. Our results show that sensitive and reliable SV detection is subject to many different factors like read length, sequence coverage and SV type. While most programs have their individual drawbacks, our greedy approach turns out to be the most robust and sensitive on many experimental setups. Sensitivities above 85% and positive predictive values between 80 and 100% could be achieved consistently for all SV types on simulated data sets starting at relatively short 75 bp reads and low 10–15 x sequence coverage.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2016-01-21
    Description: RNA structure plays a crucial role in gene maturation, regulation and function. Determining the form and frequency of RNA folds is essential for a better understanding of how RNA exerts its functions. Low-throughput studies have focused on RNA primary sequences and expression levels, but with an emphasis on relatively small numbers of transcripts. However, with the recent advent of high-throughput technologies, it is realistic to begin analyzing RNA secondary structures on a genome-wide scale. Here, we review genome-wide RNA secondary structure profiles as well as advances in computational structure predictions. We further discuss the novel characteristics of RNA secondary structure across messenger RNAs. Probing RNA secondary structure by high-throughput sequencing will enable us to build atlases of RNA secondary structures, an important step in helping us to understand the versatility of RNA functions in diverse cellular processes.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2016-01-21
    Description: Current pathway analysis approaches are primarily dedicated to capturing deregulated pathways at the population level and cannot provide patient-specific pathway deregulation information. In this article, the authors present a simple approach, called individPath, to detect pathways with significantly disrupted intra-pathway relative expression orderings for each disease sample compared with the stable, normal intra-pathway relative expression orderings pre-determined in previously accumulated normal samples. Through the analysis of multiple microarray data sets for lung and breast cancer, the authors demonstrate individPath's effectiveness for detecting cancer-associated pathways with disrupted relative expression orderings at the individual level and dissecting the heterogeneity of pathway deregulation among different patients. The portable use of this simple approach in clinical contexts is exemplified by the identification of prognostic intra-pathway gene pair signatures to predict overall survival of resected early-stage lung adenocarcinoma patients and signatures to predict relapse-free survival of estrogen receptor-positive breast cancer patients after tamoxifen treatment.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2016-01-21
    Description: Long non-coding RNAs (lncRNAs) are associated to a plethora of cellular functions, most of which require the interaction with one or more RNA-binding proteins (RBPs); similarly, RBPs are often able to bind a large number of different RNAs. The currently available knowledge is already drawing an intricate network of interactions, whose deregulation is frequently associated to pathological states. Several different techniques were developed in the past years to obtain protein–RNA binding data in a high-throughput fashion. In parallel, in silico inference methods were developed for the accurate computational prediction of the interaction of RBP–lncRNA pairs. The field is growing rapidly, and it is foreseeable that in the near future, the protein–lncRNA interaction network will rise, offering essential clues for a better understanding of lncRNA cellular mechanisms and their disease-associated perturbations.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2016-01-21
    Description: One effective way to improve the state of the art is through competitions. Following the success of the Critical Assessment of protein Structure Prediction (CASP) in bioinformatics research, a number of challenge evaluations have been organized by the text-mining research community to assess and advance natural language processing (NLP) research for biomedicine. In this article, we review the different community challenge evaluations held from 2002 to 2014 and their respective tasks. Furthermore, we examine these challenge tasks through their targeted problems in NLP research and biomedical applications, respectively. Next, we describe the general workflow of organizing a Biomedical NLP (BioNLP) challenge and involved stakeholders (task organizers, task data producers, task participants and end users). Finally, we summarize the impact and contributions by taking into account different BioNLP challenges as a whole, followed by a discussion of their limitations and difficulties. We conclude with future trends in BioNLP challenge evaluations.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2016-01-21
    Description: In this review, we describe key components of a computational infrastructure for a precision medicine program that is based on clinical-grade genomic sequencing. Specific aspects covered in this review include software components and hardware infrastructure, reporting, integration into Electronic Health Records for routine clinical use and regulatory aspects. We emphasize informatics components related to reproducibility and reliability in genomic testing, regulatory compliance, traceability and documentation of processes, integration into clinical workflows, privacy requirements, prioritization and interpretation of results to report based on clinical needs, rapidly evolving knowledge base of genomic alterations and clinical treatments and return of results in a timely and predictable fashion. We also seek to differentiate between the use of precision medicine in germline and cancer.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2016-01-21
    Description: The reproducibility of research in bioinformatics refers to the notion that new methodologies/algorithms and scientific claims have to be published together with their data and source code, in a way that other researchers may verify the findings to further build more knowledge on them. The replication and corroboration of research results are key to the scientific process, and many journals are discussing the matter nowadays, taking concrete steps in this direction. In this journal itself, a recent opinion note has appeared highlighting the increasing importance of this topic in bioinformatics and computational biology, inviting the community to further discuss the matter. In agreement with that article, we would like to propose here another step into that direction with a tool that allows the automatic generation of a web interface, named web-demo , directly from source code in a simple and straightforward way. We believe this contribution can help make research not only reproducible but also more easily accessible. A web-demo associated to a published paper can accelerate an algorithm validation with real data, wide-spreading its use with just a few clicks.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2016-01-21
    Description: Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2016-01-21
    Description: Precision medicine will revolutionize the way we treat and prevent disease. A major barrier to the implementation of precision medicine that clinicians and translational scientists face is understanding the underlying mechanisms of disease. We are starting to address this challenge through automatic approaches for information extraction, representation and analysis. Recent advances in text and data mining have been applied to a broad spectrum of key biomedical questions in genomics, pharmacogenomics and other fields. We present an overview of the fundamental methods for text and data mining, as well as recent advances and emerging applications toward precision medicine.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2016-01-21
    Description: The use of crowdsourcing to solve important but complex problems in biomedical and clinical sciences is growing and encompasses a wide variety of approaches. The crowd is diverse and includes online marketplace workers, health information seekers, science enthusiasts and domain experts. In this article, we review and highlight recent studies that use crowdsourcing to advance biomedicine. We classify these studies into two broad categories: (i) mining big data generated from a crowd (e.g. search logs) and (ii) active crowdsourcing via specific technical platforms, e.g. labor markets, wikis, scientific games and community challenges. Through describing each study in detail, we demonstrate the applicability of different methods in a variety of domains in biomedical research, including genomics, biocuration and clinical research. Furthermore, we discuss and highlight the strengths and limitations of different crowdsourcing platforms. Finally, we identify important emerging trends, opportunities and remaining challenges for future crowdsourcing research in biomedicine.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2015-03-19
    Description: Short tandem repeats are highly polymorphic and associated with a wide range of phenotypic variation, some of which cause neurodegenerative disease in humans. With advances in high-throughput sequencing technologies, there are novel opportunities to study genetic variation. While available sequencing technologies and bioinformatics tools provide options for mining high-throughput sequencing data, their suitability for analysis of repeat variation is an open question, with tools for quantifying variability in repetitive sequence still in their infancy. We present here a comprehensive survey and empirical evaluation of current sequencing technologies and bioinformatics tools in all stages of an analysis pipeline. While there is not one optimal pipeline to suit all circumstances, we find that the choice of alignment and repeat genotyping tools greatly impacts the accuracy and efficiency by which short tandem repeat variation can be detected. We further note that to detect variation relevant to many repeat diseases, it is essential to choose technologies that offer either long read-lengths or paired-end sequencing, coupled with specific genotyping tools.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2015-03-19
    Description: Machine learning, particularly kernel methods, has been demonstrated as a promising new tool to tackle the challenges imposed by today’s explosive data growth in genomics. They provide a practical and principled approach to learning how a large number of genetic variants are associated with complex phenotypes, to help reveal the complexity in the relationship between the genetic markers and the outcome of interest. In this review, we highlight the potential key role it will have in modern genomic data processing, especially with regard to integration with classical methods for gene prioritizing, prediction and data fusion.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2015-03-19
    Description: Somatic copy-number alterations (SCNAs) are an important type of structural variation affecting tumor pathogenesis. Accurate detection of genomic regions with SCNAs is crucial for cancer genomics as these regions contain likely drivers of cancer development. Deep sequencing technology provides single-nucleotide resolution genomic data and is considered one of the best measurement technologies to detect SCNAs. Although several algorithms have been developed to detect SCNAs from whole-genome and whole-exome sequencing data, their relative performance has not been studied. Here, we have compared ten SCNA detection algorithms in both simulated and primary tumor deep sequencing data. In addition, we have evaluated the applicability of exome sequencing data for SCNA detection. Our results show that (i) clear differences exist in sensitivity and specificity between the algorithms, (ii) SCNA detection algorithms are able to identify most of the complex chromosomal alterations and (iii) exome sequencing data are suitable for SCNA detection.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2015-03-19
    Description: High-throughput DNA sequencing has become a mainstay for the discovery of genomic variants that may cause disease or affect phenotype. A next-generation sequencing pipeline typically identifies thousands of variants in each sample. A particular challenge is the annotation of each variant in a way that is useful to downstream consumers of the data, such as clinical sequencing centers or researchers. These users may require that all data storage and analysis remain on secure local servers to protect patient confidentiality or intellectual property, may have unique and changing needs to draw on a variety of annotation data sets and may prefer not to rely on closed-source applications beyond their control. Here we describe scalable methods for using the plugin capability of the Ensembl Variant Effect Predictor to enrich its basic set of variant annotations with additional data on genes, function, conservation, expression, diseases, pathways and protein structure, and describe an extensible framework for easily adding additional custom data sets.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2015-03-19
    Description: Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...