ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • Artikel  (1.557)
  • BioMed Central  (1.157)
  • Wiley  (400)
  • 2010-2014  (1.557)
  • 1980-1984
  • 1950-1954
  • 1945-1949
  • 2012  (1.557)
  • Informatik  (1.557)
Sammlung
  • Artikel  (1.557)
Erscheinungszeitraum
  • 2010-2014  (1.557)
  • 1980-1984
  • 1950-1954
  • 1945-1949
Jahr
Zeitschrift
  • 1
    Publikationsdatum: 2012-12-29
    Beschreibung: Background: RNA interference (RNAi) becomes an increasingly important and effective genetic tool to study the function of target genes by suppressing specific genes of interest. This system approach helps identify signaling pathways and cellular phase types by tracking intensity and/or morphological changes of cells. The traditional RNAi screening scheme, in which one siRNA is designed to knockdown one specific mRNA target, needs a large library of siRNAs and turns out to be time-consuming and expensive. Results: In this paper, we propose a conceptual model, called compressed sensing RNAi (csRNAi), which employs the unique combination of group of small interfering RNAs (siRNAs) to knockdown a much larger size of genes. This strategy is based on the fact that one gene can be partially bound with several small interfering RNAs (siRNAs) and conversely, one siRNA can bind to a few genes with distinct binding affinity. This model constructs a multi-to-multi correspondence between siRNAs and their targets, with siRNAs much fewer than mRNA targets, compared with the conventional scheme. Mathematically this problem involves an underdetermined system of equations (linear or nonlinear), which is ill-posed in general. However, the recently developed compressed sensing (CS) theory can solve this problem. We present a mathematical model to describe the csRNAi system based on both CS theory and biological concerns. To build this model, we first search nucleotide motifs in a target gene set. Then we propose a machine learning based method to find the effective siRNAs with novel features, such as image features and speech features to describe an siRNA sequence. Numerical simulations show that we can reduce the siRNA library to one third of that in the conventional scheme. In addition, the features to describe siRNAs outperform the existing ones substantially. Conclusions: This csRNAi system is very promising in saving both time and cost for large-scale RNAi screening experiments which may benefit the biological research with respect to cellular processes and pathways.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Publikationsdatum: 2012-12-29
    Beschreibung: Background: Copy number variations (CNVs) are genomic structural variants that are found in healthy populations and have been observed to be associated with disease susceptibility. Existing methods for CNV detection are often performed on a sample-by-sample basis, which is not ideal for large datasets where common CNVs must be estimated by comparing the frequency of CNVs in the individual samples. Here we describe a simple and novel approach to locate genome-wide CNVs common to a specific population, using human ancestry as the phenotype. Results: We utilized our previously published Genome Alteration Detection Analysis (GADA) algorithm to identify common ancestry CNVs (caCNVs) and built a caCNV model to predict population structure. We identified a 73 caCNV signature using a training set of 225 healthy individuals from European, Asian, and African ancestry. The signature was validated on an independent test set of 300 individuals with similar ancestral background. The error rate in predicting ancestry in this test set was 2% using the 73 caCNV signature. Among the caCNVs identified, several were previously confirmed experimentally to vary by ancestry. Our signature also contains a caCNV region with a single microRNA (MIR270), which represents the first reported variation of microRNA by ancestry. Conclusions: We developed a new methodology to identify common CNVs and demonstrated its performance by building a caCNV signature to predict human ancestry with high accuracy. The utility of our approach could be extended to large case--control studies to identify CNV signatures for other phenotypes such as disease susceptibility and drug response.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Publikationsdatum: 2012-12-19
    Beschreibung: Background: For selection and evaluation of potential biomarkers, inclusion of already published information is of utmost importance. In spite of significant advancements in text- and data-mining techniques, the vast knowledge space of biomarkers in biomedical text has remained unexplored. Existing named entity recognition approaches are not sufficiently selective for the retrieval of biomarker information from the literature. The purpose of this study was to identify textual features that enhance the effectiveness of biomarker information retrieval for different indication areas and diverse end user perspectives. Methods: A biomarker terminology was created and further organized into six concept classes. Performance of this terminology was optimized towards balanced selectivity and specificity. The information retrieval performance using the biomarker terminology was evaluated based on various combinations of the terminology's six classes. Further validation of these results was performed on two independent corpora representing two different neurodegenerative diseases. Results: The current state of the biomarker terminology contains 119 entity classes supported by 1890 different synonyms. The result of information retrieval shows improved retrieval rate of informative abstracts, which is achieved by including clinical management terms and evidence of gene/protein alterations (e.g. gene/protein expression status or certain polymorphisms) in combination with disease and gene name recognition. When additional filtering through other classes (e.g. diagnostic or prognostic methods) is applied, the typical high number of unspecific search results is significantly reduced. The evaluation results suggest that this approach enables the automated identification of biomarker information in the literature. A demo version of the search engine SCAIView, including the biomarker retrieval, is made available to the public through http://www.scaiview.com/scaiview-academia.html. Conclusions: The approach presented in this paper demonstrates that using a dedicated biomarker terminology for automated analysis of the scientific literature maybe helpful as an aid to finding biomarker information in text. Successful extraction of candidate biomarkers information from published resources can be considered as the first step towards developing novel hypotheses. These hypotheses will be valuable for the early decision-making in the drug discovery and development process.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    Publikationsdatum: 2012-11-09
    Beschreibung: Background The robust identification of isotope patterns originating from peptides being analyzed through mass spectrometry (MS) is often significantly hampered by noise artifacts and the interference of overlappingpatterns arising e.g. from post-translational modifications. As the classification of the recorded data points into either 'noise' or 'signal' lies at the very root of essentially every proteomic application, the quality of the automated processing of mass spectra can significantly influence the way the data might be interpreted within a given biological context.Results We propose non-negative least squares/non-negative least absolute deviation regression to fit a raw spectrum by templates imitating isotope patterns. In a carefully designed validation scheme, we show that the method exhibits excellent performance in pattern picking. It is demonstrated that the method is able to disentangle complicated overlaps of patterns. Conclusions: We find that regularization is not necessary to prevent overfitting and that thresholding is an effective and user-friendly way to perform feature selection. The proposed method avoids problems inherent in regularization-based approaches, comes with a set of well-interpretable parameters whose default configuration is shown to generalize well without the need for fine-tuning, and is applicable to spectra of different platforms. The R package IPPD implements the method and is available from the Bioconductor platform (http://bioconductor.fhcrc.org/help/bioc-views/devel/bioc/html/IPPD.html).
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    facet.materialart.
    Unbekannt
    BioMed Central
    Publikationsdatum: 2012-11-10
    Beschreibung: Background: The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred. Results: For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, weintroduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrateits performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement. Conclusions: Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability andoptimality to be a significant improvement in the real-world phylogenetic analysis of sequence data.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    Publikationsdatum: 2012-11-10
    Beschreibung: Background: Patient-reported Outcomes (PROs) capturing e.g., quality of life, fatigue, depression, medication side-effects or disease symptoms, have become important outcome parameters in medical research and daily clinical practice. Electronic PRO data capture (ePRO) with software packages to administer questionnaires, storing data, and presenting results has facilitated PRO assessment in hospital settings. Compared to conventional paper-pencil versions of PRO instruments, ePRO is more economical with regard to staff resources and time, and allows immediate presentation of results to the medical staff.The objective of our project was to develop software (CHES -- Computer-based Health Evaluation System) for ePRO in hospital settings and at home with a special focus on the presentation of individual patient's results. Methods: Following the Extreme Programming development approach architecture was not fixed up-front, but was done in close, continuous collaboration with software end users (medical staff, researchers and patients) to meet their specific demands. Developed features include sophisticated, longitudinal charts linking patients' PRO data to clinical characteristics and to PRO scores from reference populations, a web-interface for questionnaire administration, and a tool for convenient creating and editing of questionnaires. Results: By 2012 CHES has been implemented at various institutions in Austria, Germany, Switzerland, and the UK and about 5000 patients participated in ePRO (with around 15000 assessments in total). Data entry is done by the patients themselves via tablet PCs with a study nurse or an intern approaching patients and supervising questionnaire completion.DiscussionDuring the last decade several software packages for ePRO have emerged for different purposes. Whereas commercial products are available primarily for ePRO in clinical trials, academic projects have focused on data collection and presentation in daily clinical practice and on extending cancer registries with PRO data. CHES includes several features facilitating the use of PRO data for individualized medical decision making. With its web-interface it allows ePRO also when patients are home. Thus, it provides complete monitoring of patients'physical and psychosocial symptom burden.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    Publikationsdatum: 2012-11-12
    Beschreibung: Background: The problem list is a key part of the electronic health record (EHR) that allows practitioners to see a patient's diagnoses and health issues. Yet, as the content of the problem list largely represents the subjective decisions of those who edit it, patients' problem lists are often unreliable when shared across practitioners. The lack of standards for how the problem list is compiled in the EHR limits its effectiveness in improving patient care, particularly as a resource for clinical decision support and population management tools. The purpose of this study is to discover practitioner opinions towards the problem list and the logic behind their decisions during clinical situations.Materials and methodsAn observational cross-sectional study was conducted at two major Boston teaching hospitals. Practitioners' opinions about the problem list were collected through both in-person interviews and an online questionnaire. Questions were framed using vignettes of clinical scenarios asking practitioners about their preferred actions towards the problem list. Results: These data confirmed prior research that practitioners differ in their opinions over managing the problem list, but in most responses to a questionnaire, there was a common approach among the relative majority of respondents. Further, basic demographic characteristics of providers (age, medical experience, etc.) did not appear to strongly affect attitudes towards the problem list. Conclusion: The results supported the premise that policies and EHR tools are needed to bring about a common approach. Further, the findings helped identify what issues might benefit the most from a defined policy and the level of restriction a problem list policy should place on the addition of different types of information.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    Publikationsdatum: 2012-11-14
    Beschreibung: Background: Professional societies recommend shared decision making (SDM) for prostate cancer screening, however, most efforts have promoted informed rather than shared decision making. The objective of this study is to 1) examine the effects of a prostate cancer screening intervention to promote SDM and 2) determine whether framing prostate information in the context of other clearly beneficial men's health services affects decisions. Methods: We conducted two separate randomized controlled trials of the same prostate cancer intervention (with or without additional information on more clearly beneficial men's health services). For each trial, we enrolled a convenience sample of 2 internal medicine practices, and their interested physicians and male patients with no prior history of prostate cancer (for a total of 4 practices, 28 physicians, and 128 men across trials). Within each practice site, we randomized men to either 1) a video-based decision aid and researcher-led coaching session or 2) a highway safety video. Physicians at each site received a 1-hour educational session on prostate cancer and SDM. To assess intervention effects, we measured key components of SDM, intent to be screened, and actual screening. After finding that results did not vary by trial, we combined data across sites, adjusting for the random effects of both practice and physician. Results: Compared to an attention control, our prostate cancer screening intervention increased men's perceptions that screening is a decision (absolute difference +41%; 95% CI 25 to 57%) and men's knowledge about prostate cancer screening (absolute difference +34%; 95% CI 19% to 50%), but had no effect on men's self-reported participation in shared decisions or their participation at their preferred level. Overall, the intervention decreased screening intent (absolute difference -34%; 95% CI -50% to -18%) and actual screening rates (absolute difference -22%; 95% CI -38 to -7%) with no difference in effect by frame. Conclusions: SDM interventions can increase men's knowledge, alter their perceptions of prostate cancer screening, and reduce actual screening. However, they may not guarantee an increase in shared decisions.Trial registration#NCT00630188
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    Publikationsdatum: 2012-11-13
    Beschreibung: Background: Efficient rule authoring tools are critical to allow clinical Knowledge Engineers (KEs), Software Engineers (SEs), and Subject Matter Experts (SMEs) to convert medical knowledge into machine executable clinical decision support rules. The goal of this analysis was to identify the critical success factors and challenges of a fully functioning Rule Authoring Environment (RAE) in order to define requirements for a scalable, comprehensive tool to manage enterprise level rules. Methods: The authors evaluated RAEs in active use across Partners Healthcare, including enterprise wide, ambulatory only, and system specific tools, with a focus on rule editors for reminder and medication rules. We conducted meetings with users of these RAEs to discuss their general experience and perceived advantages and limitations of these tools. Results: While the overall rule authoring process is similar across the 10 separate RAEs, the system capabilities and architecture vary widely. Most current RAEs limit the ability of the clinical decision support (CDS) interventions to be standardized, sharable, interoperable, and extensible. No existing system meets all requirements defined by knowledge management users. Conclusions: A successful, scalable, integrated rule authoring environment will need to support a number of key requirements and functions in the areas of knowledge representation, metadata, terminology, authoring collaboration, user interface, integration with electronic health record (EHR) systems, testing, and reporting.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    Publikationsdatum: 2012-11-14
    Beschreibung: Background: Each omics platform is now able to generate a large amount of data. Genomics, proteomics,metabolomics, interactomics are compiled at an ever increasing pace and now form a core part of thefundamental systems biology framework. Recently, several integrative approaches have beenproposed to extract meaningful information. However, these approaches lack of visualisation outputsto fully unravel the complex associations between different biological entities. Results: The multivariate statistical approaches 'regularized Canonical Correlation Analysis' and 'sparsePartial Least Squares regression' were recently developed to integrate two types of highlydimensional 'omics' data and to select relevant information. Using the results of these methods, wepropose to revisit few graphical outputs to better understand the relationships between two 'omics'data and to better visualise the correlation structure between the different biological entities. Thesegraphical outputs include Correlation Circle plots, Relevance Networks and Clustered Image Maps.We demonstrate the usefulness of such graphical outputs on several biological data sets and furtherassess their biological relevance using gene ontology analysis. Conclusions: Such graphical outputs are undoubtedly useful to aid the interpretation of these promising integrativeanalysis tools and will certainly help in addressing fundamental biological questions andunderstanding systems as a whole. AvailabilityThe graphical tools described in this paper are implemented in the freely available R packagemixOmics and in its associated web application.
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 11
    Publikationsdatum: 2012-11-15
    Beschreibung: Background: Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data,currently used Fourier series approximations of periodic gene expressions have been found not to be sufficientlyadequate to model the complexity of the time-course data, partly due to their ignoring the dependence between theexpression measurements over time and the correlation among gene expression profiles. We further investigatethe advantages and limitations of available models in the literature and propose a new mixture model withautoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Somesimulations and real examples are given to demonstrate the usefulness of the proposed models. Results: We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that ourmodel outperforms existing models to provide more reliable and robust clustering of time-course data. Our modelprovides superior results when genetic profiles are correlated. It also gives comparable results when the correlationbetween the gene profiles is weak. In the applications to real time-course data, relevant clusters of co-regulatedgenes are obtained, which are supported by gene-function annotation databases. Conclusions: Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clusteringtime-course data because it adopts a random effects model that allows for the correlation among observations atdifferent time points. It postulates gene-specific random effects with an auto-correlation variance structure thatmodels coregulation within the clusters The developed R package is flexible in its specification of the randomeffectsthrough user-input parameters that enables improved modelling and consequent clustering of time-coursedata.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 12
    Publikationsdatum: 2012-11-16
    Beschreibung: Background: 454 pyrosequencing is a commonly used massively parallel DNA sequencing technology with a wide variety of application fields such as epigenetics, metagenomics and transcriptomics. A well-known problem of this platform is its sensitivity to base-calling insertion and deletion errors, particularly in the presence of long homopolymers. In addition, the base-call quality scores are not informative with respect to whether an insertion or a deletion error is more likely. Surprisingly, not much effort has been devoted to the development of improved base-calling methods and more intuitive quality scores for this platform. Results: We present HPCall, a 454 base-calling method based on a weighted Hurdle Poisson model. HPCall uses a probabilistic framework to call the homopolymer lengths in the sequence by modeling well-known 454 noise predictors. Base-calling quality is assessed based on estimated probabilities for each homopolymer length, which are easily transformed to useful quality scores. Conclusions: Using a reference data set of the Escherichia coli K-12 strain, we show that HPCall produces superior quality scores that are very informative towards possible insertion and deletion errors, while maintaining a base-calling accuracy that is better than the current one. Given the generality of the framework, HPCall has the potential to also adapt to other homopolymer-sensitive sequencing technologies.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 13
    Publikationsdatum: 2012-12-08
    Beschreibung: Background: Hyperbilirubinemia is emerging as an increasingly common problem in newborns due to a decreasing hospital length of stay after birth. Jaundice is the most common disease of the newborn and although being benign in most cases it can lead to severe neurological consequences if poorly evaluated. In different areas of medicine, data mining has contributed to improve the results obtained with other methodologies.Hence, the aim of this study was to improve the diagnosis of neonatal jaundice with the application of data mining techniques. Methods: This study followed the different phases of the Cross Industry Standard Process for Data Mining model as its methodology.This observational study was performed at the Obstetrics Department of a central hospital (Centro Hospitalar Tamega e Sousa -- EPE), from February to March of 2011. A total of 227 healthy newborn infants with 35 or more weeks of gestation were enrolled in the study. Over 70 variables were collected and analyzed. Also, transcutaneous bilirubin levels were measured from birth to hospital discharge with maximum time intervals of 8 hours between measurements, using a noninvasive bilirubinometer.Different attribute subsets were used to train and test classification models using algorithms included in Weka data mining software, such as decision trees (J48) and neural networks (multilayer perceptron). The accuracy results were compared with the traditional methods for prediction of hyperbilirubinemia. Results: The application of different classification algorithms to the collected data allowed predicting subsequent hyperbilirubinemia with high accuracy. In particular, at 24 hours of life of newborns, the accuracy for the prediction of hyperbilirubinemia was 89%. The best results were obtained using the following algorithms: naive Bayes, multilayer perceptron and simple logistic. Conclusions: The findings of our study sustain that, new approaches, such as data mining, may support medical decision, contributing to improve diagnosis in neonatal jaundice.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 14
    Publikationsdatum: 2012-12-09
    Beschreibung: Background: Multivariate approaches have been successfully applied to genome wide association studies. Recently, a Partial Least Squares (PLS) based approach was introduced for mapping yeast genotype-phenotype relations, where background information such as gene function classification, gene dispensability, recent or ancient gene copy number variations and the presence of premature stop codons or frameshift mutations in reading frames, were used post hoc to explain selected genes. One of the latest advancement in PLS named L-Partial Least Squares (L-PLS), where 'L' presents the used data structure, enables the use of background information at the modeling level. Here, a modification of L-PLS with variable importance on projection (VIP) was implemented using a stepwise regularized procedure for gene and background information selection. Results werecompared to PLS-based procedures, where no background information was used. Results: Applying the proposed methodology to yeast Saccharomyces cerevisiae data, we found the relationship between genotype-phenotype to have improved understandability. Phenotypic variations were explained by the variations of relatively stable genes and stable background variations. The suggested procedure provides an automatic way for genotype-phenotype mapping. The selected phenotype influencing genes were evolving 29% faster than non-influential genes, and the current results are supported by a recently conducted study. Further power analysis on simulated data verified that the proposed methodology selects relevant variables. Conclusions: A modification of L-PLS with VIP in a stepwise regularized elimination procedure can improve the understandability and stability of selected genes and background information. The approach is recommended for genome wide association studies where background information is available.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 15
    Publikationsdatum: 2012-12-09
    Beschreibung: Background: Biomarker panels derived separately from genomic and proteomic data and with a variety of computational methods have demonstrated promising classification performance in various diseases. An open question is how to create effective proteo-genomic panels. The framework of ensemble classifiers has been applied successfully in various analytical domains to combine classifiers so that the performance of the ensemble exceeds the performance of individual classifiers. Using blood-based diagnosis of acute renal allograft rejection as a case study, we address the following question in this paper: Can acute rejection classification performance be improved by combining individual genomic and proteomic classifiers in an ensemble? Results: The first part of the paper presents a computational biomarker development pipeline for genomic and proteomic data. The pipeline begins with data acquisition (e.g., from bio-samples to microarray data), quality control, statistical analysis and mining of the data, and finally various forms of validation. The pipeline ensures that the various classifiers to be combined later in an ensemble are diverse and adequate for clinical use. Five mRNA genomic and five proteomic classifiers were developed independently using single time-point blood samples from 11 acute-rejection and 22 non-rejection renal transplant patients. The second part of the paper examines five ensembles ranging in size from two to 10 individual classifiers. Performance of ensembles is characterized by area under the curve (AUC), sensitivity, and specificity, as derived from the probability of acute rejection for individual classifiers in the ensemble in combination with one of two aggregation methods: (1) Average Probability or (2) Vote Threshold. One ensemble demonstrated superior performance and was able to improve sensitivity and AUC beyond the best values observed for any of the individual classifiers in the ensemble, while staying within the range of observed specificity. The Vote Threshold aggregation method achieved improved sensitivity for all 5 ensembles, but typically at the cost of decreased specificity. Conclusion: Proteo-genomic biomarker ensemble classifiers show promise in the diagnosis of acute renal allograft rejection and can improve classification performance beyond that of individual genomic or proteomic classifiers alone. Validation of our results in an international multicenter study is currently underway.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 16
    Publikationsdatum: 2012-12-10
    Beschreibung: Background: Co-expression measures are often used to define networks among genes. Mutual information (MI) is often used as a generalized correlation measure. It is not clear how much MI adds beyond standard (robust) correlation measures or regression model based association measures. Further, it is important to assess what transformations of these and other co-expression measures lead to biologically meaningful modules (clusters of genes). Results: We provide a comprehensive comparison between mutual information and several correlation measures in 8 empirical data sets and in simulations. We also study different approaches for transforming an adjacency matrix, e.g. using the topological overlap measure. Overall, we confirm close relationships between MI and correlation in all data sets which reflects the fact that most gene pairs satisfy linear or monotonic relationships. We discuss rare situations when the two measures disagree. We also compare correlation and MI based approaches when it comes to defining co-expression network modules. We show that a robust measure of correlation (the biweight midcorrelation transformed via the topological overlap transformation) leads to modules that are superior to MI based modules and maximal information coefficient (MIC) based modules in terms of gene ontology enrichment. We present a function that relates correlation to mutual information which can be used to approximate the mutual information from the corresponding correlation coefficient. We propose the use of polynomial or spline regression models as an alternative to MI for capturing non-linear relationships between quantitative variables. Conclusions: The biweight midcorrelation outperforms MI in terms of elucidating gene pairwise relationships. Coupled with the topological overlap matrix transformation, it often leads to more significantly enriched co-expression modules. Spline and polynomial networks form attractive alternatives to MI in case of non-linear relationships. Our results indicate that MI networks can safely be replaced by correlation networks when it comes to measuring co-expression relationships in stationary data.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 17
    Publikationsdatum: 2012-12-11
    Beschreibung: Background: The quantum increases in home Internet access and available online health information with limited control over information quality highlight the necessity of exploring decision making processes in accessing and using online information, specifically in relation to children who do not make their health decisions. The aim of this study was to understand the processes explaining parents' decisions to use online health information for child health care. Methods: Parents (N = 391) completed an initial questionnaire assessing the theory of planned behaviour constructs of attitude, subjective norm, and perceived behavioural control, as well as perceived risk, group norm, and additional demographic factors. Two months later, 187 parents completed a follow-up questionnaire assessing their decisions to use online information for their child's health care, specifically to 1) diagnose and/or treat their child's suspected medical condition/illness and 2) increase understanding about a diagnosis or treatment recommended by a health professional. Results: Hierarchical multiple regression showed that, for both behaviours, attitude, subjective norm, perceived behavioural control, (less) perceived risk, group norm, and (non) medical background were the significant predictors of intention. For parents' use of online child health information, for both behaviours, intention was the sole significant predictor of behaviour. The findings explain 77% of the variance in parents' intention to treat/diagnose a child health problem and 74% of the variance in their intentions to increase their understanding about child health concerns. Conclusions: Understanding parents' socio-cognitive processes that guide their use of online information for child health care is important given the increase in Internet usage and the sometimes-questionable quality of health information provided online. Findings highlight parents' thirst for information; there is an urgent need for health professionals to provide parents with evidence-based child health websites in addition to general population education on how to evaluate the quality of online health information.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 18
    Publikationsdatum: 2012-12-12
    Beschreibung: Background: Illumina BeadArray technology includes non specific negative control features that allow a precise estimation of the background noise. As an alternative to the background subtraction proposed in BeadStudio which leads to an important loss of information by generating negative values, a background correction method modeling the observed intensities as the sum of the exponentially distributed signal and normally distributed noise has been developed. Nevertheless, Wang and Ye (2012) display a kernel-based estimator of the signal distribution on Illumina BeadArrays and suggest that a gamma distribution would represent a better modeling of the signal density. Hence, the normal-exponential modeling may not be appropriate for Illumina data and background corrections derived from this model may lead to wrong estimation. Results: We propose a more flexible modeling based on a gamma distributed signal and a normal distributed background noise and develop the associated background correction, implemented in the R-package NormalGamma. Our model proves to be markedly more accurate to model Illumina BeadArrays: on the one hand, it is shown on two types of Illumina BeadChips that this model offers a more correct fit of the observed intensities. On the other hand, the comparison of the operating characteristics of several background correction procedures on spike-in and on normal-gamma simulated data shows high similarities, reinforcing the validation of the normal-gamma modeling. The performance of the background corrections based on the normal-gamma and normal-exponential models are compared on two dilution data sets, through testing procedures which represent various experimental designs. Surprisingly, we observe that the implementation of a more accurate parametrisation in the model-based background correction does not increase the sensitivity. These results may be explained by the operating characteristics of the estimators: the normal-gamma background correction offers an improvement in terms of bias, but at the cost of a loss in precision. Conclusions: This paper addresses the lack of fit of the usual normal-exponential model by proposing a more flexible parametrisation of the signal distribution as well as the associated background correction. This new model proves to be considerably more accurate for Illumina microarrays, but the improvement in terms of modeling does not lead to a higher sensitivity in differential analysis. Nevertheless, this realistic modeling makes way for future investigations, in particular to examine the characteristics of pre-processing strategies.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 19
    Publikationsdatum: 2012-09-25
    Beschreibung: Background: Biologists are elucidating complex collections of genetic regulatory data for multiple organisms. Software is needed for such regulatory network data. Results: The Pathway Tools software provides a comprehensive environment for manipulating molecular regulatory interactions that integrates regulatory data with an organism's genome and metabolic network. The Pathway Tools regulation ontology captures transcriptional and translational regulation, substrate-level regulation of enzyme activity, post-translational modifications, and regulatory pathways. Curated collections of regulatory data are available for Escherichia coli, Bacillus subtilis, and Shewanella oneidensis. Regulatory visualizations include a novel diagram that sum- marizes all regulatory influences on a gene; a transcription-unit diagram, and an interactive visualization of a full transcriptional regulatory network that can be painted with gene expression data to probe correlations between gene expression and regulatory mechanisms. We introduce a novel type of enrichment analysis that asks whether a gene-expression dataset is over-represented for known regulators. We present algorithms for ranking the degree of regulatory influence of genes , and for computing the net positive and negative regulatory influences on a gene.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 20
    Publikationsdatum: 2012-09-26
    Beschreibung: Background: Inverted repeat genes encode precursor RNAs characterized by hairpin structures. These RNA hairpins are then metabolized by biosynthetic pathways to produce functional small RNAs. In eukaryotic genomes, short non-autonomous transposable elements can have similar size and hairpin structures as non-coding precursor RNAs. This resemblance leads to problems annotating small RNAs.MethodWe mapped all microRNA precursors from miRBASE to several genomes and studied the repetition and dispersion of the corresponding loci. We then searched for repetitive elements overlapping these loci. Results: We developed an automatic method called ncRNAclassifier to classify pre-ncRNAs according to their relationship with transposable elements (TEs). We show there is a correlation between the number of scattered occurrences of ncRNA precursor candidates is correlated with the presence of TEs. We applied ncRNAclassifier on six chordate genomes and report our findings. Among the 1,426 human and 721 mouse pre-miRNAs of miRBase, we identified 235 and 68 mis-annotated pre-miRNAs respectively corresponding completely to TEs. Conclusions: We provide a tool enabling the identification of repetitive elements in precursor ncRNA sequences. ncRNAclassifier is available at http://EvryRNA.ibisc.univ-evry.fr
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 21
    Publikationsdatum: 2012-09-27
    Beschreibung: Background: Algorithms designed to detect complex genetic disease associations are initially evaluated using simulated datasets. Typical evaluations vary constraints that influence the correct detection of underlying models (i.e. number of loci, heritability, and minor allele frequency). Such studies neglect to account for model architecture (i.e. the unique specification and arrangement of penetrance values comprising the genetic model), which alone can influence the detectability of a model. In order to design a simulation study which efficiently takes architecture into account, areliable metric is needed for model selection. Results: We evaluate three metrics as predictors of relative model detection difficulty derived from previous works: (1) Penetrance table variance (PTV), (2) customized odds ratio (COR), and (3) our own Ease of Detection Measure (EDM), calculated from the penetrance values and respective genotype frequencies of each simulated genetic model. We evaluate the reliability of these metrics across three very different data search algorithms, each with the capacity to detect epistatic interactions. We find that a model's EDM and COR are each stronger predictors of model detection success than heritability. Conclusions: This study formally identifies and evaluates metrics which quantify model detection difficulty. We utilize these metrics to intelligently select models from a population of potential architectures. This allows for an improved simulation study design which accounts for differences in detection difficulty attributed to model architecture. We implement the calculation and utilization of EDM and COR into GAMETES, an algorithm which rapidly and precisely generates pure, strict, n-locus epistatic models.
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 22
    Publikationsdatum: 2012-10-02
    Beschreibung: Background: Geneticists who look beyond single locus disease associations require additional strategies for the detection of complex multi-locus effects. Epistasis, a multi-locus masking effect, presents a particular challenge, and has been the target of bioinformatic development. Thorough evaluation of new algorithms calls for simulation studies in which known disease models are sought. To date, the best methods for generating simulated multi-locus epistatic models rely on genetic algorithms. However, such methods are computationally expensive, difficult to adapt to multiple objectives, and unlikely to yield models with a precise form of epistasis which we refer to as pure and strict. Purely and strictly epistatic models constitute the worst-case in terms of detecting disease associations, since such associations may only be observed if all n-loci are included in the disease model. This makes them an attractive gold standard for simulation studies considering complex multi-locus effects. Results: We introduce GAMETES, a user-friendly software package and algorithm which generates complex biallelic single nucleotide polymorphism (SNP) disease models for simulation studies. GAMETES rapidly and precisely generates random, pure, strict n-locus models with specified genetic constraints. These constraints include heritability, minor allele frequencies of the SNPs, and population prevalence. GAMETES also includes a simple dataset simulation strategy which may be utilized to rapidly generate an archive of simulated datasets for given genetic models. We highlight the utility and limitations of GAMETES with an example simulation study using MDR, an algorithm designed to detect epistasis. Conclusions: GAMETES is a fast, flexible, and precise tool for generating complex n-locus models with random architectures. While GAMETES has a limited ability to generate models with higher heritabilities, it is proficient at generating the lower heritability models typically used in simulation studies evaluating new algorithms. In addition, the GAMETES modeling strategy may be flexibly combined with any dataset simulation strategy. Beyond dataset simulation, GAMETES could be employed to pursue theoretical characterization of genetic models and epistasis.
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 23
    Publikationsdatum: 2012-10-03
    Beschreibung: Background: One possible approach towards avoiding alert overload and alert fatigue in Computerized Physician Order Entry (CPOE) systems is to tailor their drug safety alerts to the context of the clinical situation. Our objective was to identify the perceptions of physicians on the usefulness of clinical context information for prioritizing and presenting drug safety alerts. Methods: We performed a questionnaire survey, inquiring CPOE-using physicians from four hospitals in four European countries to estimate the usefulness of 20 possible context factors. Results: The 223 participants identified the 'severity of the effect' and the 'clinical status of the patient' as the most useful context factors. Further important factors are the 'complexity of the case' and the 'risk factors of the patient'. Conclusions: Our findings confirm the results of a prior, comparable survey inquiring CPOE researchers. Further research should focus on implementing these context factors in CPOE systems and on subsequently evaluating their impact.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 24
    Publikationsdatum: 2012-10-05
    Beschreibung: Background: Information is essential in healthcare. Recording, handling and sharing healthcare information is important in order to ensure high quality of delivered healthcare. Information and communication technology (ICT) may be a valuable tool for handling these challenges. One way of enhancing the exchange of information could be to establish a link between patient-specific and general information sent to the general practitioner (GP). The aim of the present paper is to study GPs' use of a hyperlink inserted into electronic test result communication. Methods: We inserted a hyperlink into the electronic test result communication sent to the patients' GPs who participated in a regional, systematic breast cancer screening program. The hyperlink target was a web-site with information on the breast cancer screening program and breast cancer in general. Different strategies were used to increase the GPs' use of this hyperlink. The outcome measure was the GPs' self-reported use of the link. Data were collected by means of a one-page paper-based questionnaire. Results: The response rate was 73% (n=242). In total, 108 (45%) of the GPs reported to have used the link. In all, 22% (n=53) of the GPs used the web-address from a paper letter and 37% (n=89) used the hyperlink in the electronic test result communication (Delta = 15%[95%confidence int erval(CI) = 8 - 22%P 〈 0.001]). We found no statistically significant associations between use of the web-address/hyperlink and the GP's gender, age, or attitude towards mammography screening. Conclusions: The results suggest that hyperlinks in electronic test result communication could be a feasible strategy for combining and sharing different types of healthcare information.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 25
    Publikationsdatum: 2012-10-14
    Beschreibung: Background: New computational resources are needed to manage the increasing volume of biological data from genome sequencing projects. One fundamental challenge is the ability to maintain a complete and current catalog of protein diversity. We developed a new approach for the identification of protein families that focuses on the rapid discovery of homologous protein sequences. Results: We implemented fully automated and high-throughput procedures to de novo cluster proteins into families based upon global alignment similarity. Our approach employs an iterative clustering strategy in which homologs of known families are sifted out of the search for new families. The resulting reduction in computational complexity enables us to rapidly identify novel protein families found in new genomes and to perform efficient, automated updates that keep pace with genome sequencing. We refer to protein families identified through this approach as "Sifting Families," or SFams. Our analysis of ~10.5 million protein sequences from 2,928 genomes identified 436,360 SFams, many of which are not represented in other protein family databases. We validated the quality of SFam clustering through statistical as well as network topology--based analyses. Conclusions: We describe the rapid identification of SFams and demonstrate how they can be used to annotate genomes and metagenomes. The SFam database catalogs protein-family quality metrics, multiple sequence alignments, hidden Markov models, and phylogenetic trees. Our source code and database are publicly available and will be subject to frequent updates (http://edhar.genomecenter.ucdavis.edu/sifting_families/).
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 26
    Publikationsdatum: 2012-09-22
    Beschreibung: Background: Chromosome conformation capture experiments result in pairwise proximity measurements between chromosome locations in a genome, and they have been used to construct three-dimensional models of genomic regions, chromosomes, and entire genomes. These models can be used to understand long-range gene regulation, chromosome rearrangements, and the relationships between sequence and spatial location. However, it is unclear whether these pairwise distance constraints provide sufficient information to embed chromatin in three dimensions. A priori, it is possible that an infinite number of embeddings are consistent with the measurements due to a lack of constraints between some regions. It is therefore necessary to separate regions of the chromatin structure that are sufficiently constrained from regions with measurements that do not provide enough information to reconstruct the embedding. Results: We present a new method based on graph rigidity to assess the suitability of experiments for constructingplausible three-dimensional models of chromatin structure. Underlying this analysis is a new, efficient, andaccurate algorithm for finding sufficiently constrained (rigid) collections of constraints in three dimensions, aproblem for which there is no known efficient algorithm. Applying the method to four recent chromosomeconformation experiments, we find that, for even stringently filtered constraints, a large rigid component spansmost of the measured region. Filtering highlights higher-confidence regions, and we find that the organizationof these regions depends crucially on short-range interactions. Conclusions: Without performing an embedding or creating a frequency-to-distance mapping, our proposed approachestablishes which substructures are supported by a sufficient framework of interactions. It also establishes thatinteractions from recent highly filtered genome-wide chromosome conformation experiments provide anadequate set of constraints for embedding. Pre-processing experimentally observed interactions with thismethod before relating chromatin structure to biological phenomena will ensure that hypothesized correlationsare not driven by the arbitrary choice of a particular unconstrained embedding. The software for identifyingrigid components is GPL-Licensed and available for download at http://cbcb.umd.edu/kingsford-group/starfish.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 27
    Publikationsdatum: 2012-09-22
    Beschreibung: Background: Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology. Results: GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests. Conclusions: The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 28
    Publikationsdatum: 2012-09-22
    Beschreibung: Background: The University of California, Santa Cruz (UCSC) genome database is among the most used sources of genomic annotation in human and other organisms. The database offers an excellent web-based graphical user interface (the UCSC genome browser) and several means for programmatic queries. A simple application programming interface (API) in a scripting language aimed at the biologist was however not yet available. Here, we present the Ruby UCSC API, a library to access the UCSC genome database using Ruby. Results: The API is designed as a BioRuby plug-in and built on the ActiveRecord 3 framework for the object-relational mapping, making writing SQL statements unnecessary. The current version of the API supports databases of all organisms in the UCSC genome database including human, mammals, vertebrates, deuterostomes, insects, nematodes, and yeast.The API uses the bin index---if available---when querying for genomic intervals. The API also supports genomic sequence queries using locally downloaded *.2bit files that are not stored in the official MySQL database. The API is implemented in pure Ruby and is therefore available in different environments and with different Ruby interpreters (including JRuby). Conclusions: Assisted by the straightforward object-oriented design of Ruby and ActiveRecord, the Ruby UCSC API will facilitate biologists to query the UCSC genome database programmatically. The API is available through the RubyGem system. Source code and documentation are available at https://github.com/misshie/bioruby-ucsc-api/ under the Ruby license. Feedback and help is provided via the website at http://rubyucscapi.userecho.com/.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 29
    Publikationsdatum: 2012-09-25
    Beschreibung: Background: Sporadic Amyotrophic Lateral Sclerosis (sALS) is a devastating, complex disease of unknown etiology. We studied this disease with microarray technology to capture as much biological complexity as possible. The Affymetrix-focused BaFL pipeline takes into account problems with probes that arise from physical and biological properties, so we adapted it to handle the long-oligonucleotide probes on our arrays (hence LO-BaFL). The revised method was tested against a validated array experiment and then used in a meta-analysis of peripheral white blood cells from healthy control samples in two experiments. We predicted differentially expressed (DE) genes in our sALS data, combining the results obtained using the TM4 suite of tools with those from the LO-BaFL method. Those predictions were tested using qRT-PCR assays. Results: LO-BaFL filtering and DE testing accurately predicted previously validated DE genes in a published experiment on coronary artery disease (CAD). Filtering healthy control data from the sALS and CAD studies with LO-BaFL resulted in highly correlated expression levels across many genes. After bioinformatics analysis, twelve genes from the sALS DE gene list were selected for independent testing using qRT-PCR assays. High-quality RNA from six healthy Control and six sALS samples yielded the predicted differential expression for 7 genes: TARDBP, SKIV2L2, C12orf35, DYNLT1, ACTG1, B2M, and ILKAP. Four of the seven have been previously described in sALS studies, while ACTG1, B2M and ILKAP appear in the context of this disease for the first time. Supplementary material can be accessed at: http://webpages.uncc.edu/~cbaciu/LO-BaFL/supplementary_data.html Conclusion: LO-BaFL predicts DE results that are broadly similar to those of other methods. The small healthy control cohort in the sALS study is a reasonable foundation for predicting DE genes. Modifying the BaFL pipeline allowed us to remove noise and systematic errors, improving the power of this study, which had a small sample size. Each bioinformatics approach revealed DE genes not predicted by the other; subsequent PCR assays confirmed seven of twelve candidates, a relatively high success rate.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 30
    Publikationsdatum: 2012-09-27
    Beschreibung: Background: With the advent of next-generation sequencing (NGS) technologies, full cDNA shotgun sequencing has become a major approach in the study of transcriptomes, and several different protocols in 454 sequencing have been invented. As each protocol uses its own short DNA tags or adapters attached to the ends of cDNA fragments for labeling or sequencing, different contaminants may lead to mis-assembly and inaccurate sequence products. Results: We have designed and implemented a new program for raw sequence cleaning in a graphical user interface and a batch script. The cleaning process consists of several modules including barcode trimming, sequencing adapter trimming, amplification primer trimming, poly-A tail trimming, vector screening and low quality region trimming. These modules can be combined based on various sequencing applications. Conclusions: ESTclean is a software package not only for cleaning cDNA sequences, but also for helping to develop sequencing protocols by providing summary tables and figures for sequencing quality control in a graphical user interface. It outperforms in cleaning read sequences from complicated sequencing protocols which use barcodes and multiple amplification primers.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 31
    Publikationsdatum: 2012-09-27
    Beschreibung: Background: While the genetics of diploid inheritance are well studied and software for linkage mapping, haplotyping and QTL analysis are available, for tetraploids the available tools are limited. In order to develop such tools it would be helpful if simulated populations based on a variety of models of the tetraploid meiosis would be available. Results: Here we present PedigreeSim, a software package that simulates meiosis in both diploid and tetraploid species and uses this to simulate pedigrees and cross populations. For tetraploids a variety of models can be used, including both bivalent and quadrivalent formation, varying degrees of preferential pairing of hom(oe)ologous chromosomes, different quadrivalent configurations and more. Simulation of quadrivalent meiosis results as expected in double reduction and recombination between more than two hom(oe)ologous chromosomes. The results are shown to match theoretical predictions. Conclusions: This is the first simulation software that implements all features of meiosis in tetraploids. It allows to generate data for tetraploid and diploid populations, and to investigate different models of tetraploid meiosis. The software and manual are available from http://www.plantbreeding.wur.nl/UK/software_pedigreeSim.html and as Additional files 1, 2, 3 and 4 with this publication.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 32
    Publikationsdatum: 2012-09-28
    Beschreibung: Background: Recently, there has been considerable effort to promote the use of health informationtechnology (HIT) in order to improve health care quality. However, relatively little is knownabout the extent to which HIT implementation is associated with hospital patient care quality.We undertook this study to determine the association of various HITs with: hospital qualityimprovement (QI) practices and strategies; adherence to process of care measures; riskadjustedinpatient mortality; patient satisfaction; and assessment of patient care quality byhospital quality managers and front-line clinicians. Methods: We conducted surveys of quality managers and front-line clinicians (physicians and nurses)in 470 short-term, general hospitals to obtain data on hospitals' extent of HITimplementation, QI practices and strategies, assessments of quality performance,commitment to quality, and sufficiency of resources for QI. Of the 470 hospitals, 401submitted complete data necessary for analysis. We also developed measures of hospitalperformance from several publicly data available sources: Hospital Compare adherence toprocess of care measures; Medicare Provider Analysis and Review (MEDPAR) file; andHospital Consumer Assessment of Healthcare Providers and Systems HCAHPS(R) survey. Weused Poisson regression analysis to examine the association between HIT implementation andQI practices and strategies, and general linear models to examine the relationship betweenHIT implementation and hospital performance measures. Results: Controlling for potential confounders, we found that hospitals with high levels of HITimplementation engaged in a statistically significant greater number of QI practices andstrategies, and had significantly better performance on mortality rates, patient satisfactionmeasures, and assessments of patient care quality by hospital quality managers; there wasweaker evidence of higher assessments of patient care quality by front-line clinicians. Conclusions: Hospital implementation of HIT was positively associated with activities intended to improvepatient care quality and with higher performance on four of six performance measures.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 33
    Publikationsdatum: 2012-10-04
    Beschreibung: Background: Influenza is a well known and common human respiratory infection, causing significant morbidity and mortality every year. Despite Influenza variability, fast and reliable outbreak detection is required for health resource planning. Clinical health records, as published by the Diagnosticat database in Catalonia, host useful data for probabilistic detection of influenza outbreaks. Methods: This paper proposes a statistical method to detect influenza epidemic activity. Non-epidemic incidence rates are modeled against the exponential distribution, and the maximum likelihood estimate for the decaying factor $\lambda$ is calculated. The sequential detection algorithm updates the parameter as new data becomes available. Binary epidemic detection of weekly incidence rates is assessed by Kolmogorov-Smirnov test on the absolute difference between the empirical and the cumulative density function of the estimated exponential distribution with significance level $0\leq \alpha \leq 1$. Results: The main advantage with respect to other approaches is the adoption of a statistically meaningful test, which provides an indicator of epidemic activity with an associated probability. The detection algorithm was initiated with parameter $\lambda_0= 3.8617$ estimated from the training sequence (corresponding to non-epidemic rates of the 2008-2009 influenza season) and sequentially updated. Kolmogorov-Smirnov test detected the following weeks as epidemic for each influenza season: 50-10 (2008-2009 season), 38-50 (2009-2010 season), weeks 50-9 (2010-2011 season) and weeks 3 to 12 for the current 2011-2012 season. Conclusions: Real medical data was used to assess the validity of the approach, as well as to construct a realistic statistical model of weekly influenza incidence rates in non-epidemic periods. For the tested data, the results confirmed the ability of the algorithm to detect the start and the end of epidemic periods. In general, the proposed test could be applied to other data sets to quickly detect influenza outbreaks. The sequential structure of the test makes it suitable for implementation in many platforms at a low computational cost without requiring to store large data sets.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 34
    Publikationsdatum: 2012-10-04
    Beschreibung: MicroRNAs (miRNAs), a class of endogenous small noncoding RNAs, mediate posttranscriptional regulation of protein-coding genes by binding chiefly to the 3' untranslated region of target mRNAs, leading to translational inhibition, mRNA destabilization or degradation. A single miRNA concurrently downregulates hundreds of target mRNAs designated "targetome", and thereby fine-tunes gene expression involved in diverse cellular functions, such as development, differentiation, proliferation, apoptosis and metabolism. Recently, we characterized the molecular network of the whole human miRNA targetome by using bioinformatics tools for analyzing molecular interactions on the comprehensive knowledgebase. We found that the miRNA targetome regulated by an individual miRNA generally constitutes the biological network of functionally-associated molecules in human cells, closely linked to pathological events involved in cancers and neurodegenerative diseases. We also identified a collaborative regulation of gene expression by transcription factors and miRNAs in cancer-associated miRNA targetome networks. This review focuses on the workflow of molecular network analysis of miRNA targetome in silico. We applied the workflow to two representative datasets, composed of miRNA expression profiling of adult T cell leukemia (ATL) and Alzheimer's disease (AD), retrieved from Gene Expression Omnibus (GEO) repository. The results supported the view that miRNAs act as a central regulator of both oncogenesis and neurodegeneration.
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 35
    Publikationsdatum: 2012-10-04
    Beschreibung: Background: Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement. Results: The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDB's) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components. Conclusions: VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 36
    Publikationsdatum: 2012-10-04
    Beschreibung: Background: We consider the problem of finding the maximum frequent agreement subtrees (MFASTs) in a collection ofphylogenetic trees. Existing methods for this problem often do not scale beyond datasets with around 100taxa. Our goal is to address this problem for datasets with over a thousand taxa and hundreds of trees. Results: We develop a heuristic solution that aims to find MFASTs in sets of many, large phylogenetic trees. Ourmethod works in multiple phases. In the first phase, it identifies small candidate subtrees from the set of inputtrees which serve as the seeds of larger subtrees. In the second phase, it combines these small seeds to buildlarger candidate MFASTs. In the final phase, it performs a post-processing step that ensures that we find afrequent agreement subtree that is not contained in a larger frequent agreement subtree. We demonstrate thatthis heuristic can easily handle data sets with 1000 taxa, greatly extending the estimation of MFASTs beyondcurrent methods. Conclusions: Although this heuristic does not guarantee to find all MFASTs or the largest MFAST, it found the MFAST inall of our synthetic datasets where we could verify the correctness of the result. It also performed well on largeempirical data sets. Its performance is robust to the number and size of the input trees. Overall, this methodprovides a simple and fast way to identify strongly supported subtrees within large phylogenetic hypotheses.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 37
    Publikationsdatum: 2012-10-04
    Beschreibung: Background: Currently, there is no open-source, cross-platform and scalable framework for coalescent analysis in population genetics. There is no scalable GUI based user application either. Such a framework and application would not only drive the creation of more complex and realistic models but also make them truly accessible. Results: As a first attempt, we built a framework and user application for the domain of exact calculations in coalescent analysis. The framework provides an API with the concepts of model, data, statistic, phylogeny, gene tree and recursion. Infinite-alleles and infinite-sites models are considered. It defines pluggable computations such as counting and listing all the ancestral configurations and genealogies and computing the exact probability of data. It can visualize a gene tree, trace and visualize the internals of the recursion algorithm for further improvement and attach dynamically a number of output processors. The user application defines jobs in a plug-in like manner so that they can be activated, deactivated, installed or uninstalled on demand. Multiple jobs can be run and their inputs edited. Job inputs are persisted across restarts and running jobs can be cancelled where applicable. Conclusions: Coalescent theory plays an increasingly important role in analysing molecular population genetic data. Models involved are mathematically difficult and computationally challenging. An open-source, scalable framework that lets users immediately take advantage of the progress made by others will enable exploration of yet more difficult and realistic models. As models become more complex and mathematically less tractable, the need for an integrated computational approach is obvious. Object oriented designs, though has upfront costs, are practical now and can provide such an integrated approach.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 38
    Publikationsdatum: 2012-10-04
    Beschreibung: Background: Misplaced or poorly calibrated confidence in healthcare professionals' judgments compromises the quality of health care. Using higher fidelity clinical simulations to elicit clinicians' confidence 'calibration' (i.e. overconfidence or underconfidence) in more realistic settings is a promising but underutilized tactic. In this study we examine nurses' calibration of confidence with judgment accuracy for critical event risk assessment judgments in a high fidelity simulated clinical environment. The study also explores the effects of clinical experience, task difficulty and time pressure on the relationship between confidence and accuracy. Methods: 63 student and 34 experienced nurses made dichotomous risk assessments on 25 scenarios simulated in a high fidelity clinical environment. Each nurse also assigned a score (0--100) reflecting the level of confidence in their judgments. Scenarios were derived from real patient cases and classified as easy or difficult judgment tasks. Nurses made half of their judgments under time pressure. Confidence calibration statistics were calculated and calibration curves generated. Results: Nurse students were underconfident (mean over/underconfidence score -1.05) and experienced nurses overconfident (mean over/underconfidence score 6.56), P = 0.01. No significant differences in calibration and resolution were found between the two groups (P = 0.80 and P = 0.51, respectively). There was a significant interaction between time pressure and task difficulty on confidence (P = 0.008); time pressure increased confidence in easy cases but reduced confidence in difficult cases. Time pressure had no effect on confidence or accuracy. Judgment task difficulty impacted significantly on nurses' judgmental accuracy and confidence. A 'hard-easy' effect was observed: nurses were overconfident in difficult judgments and underconfident in easy judgments. Conclusion: Nurses were poorly calibrated when making risk assessment judgments in a high fidelity simulated setting. Nurses with more experience tended toward overconfidence. Whilst time pressure had little effect on calibration, nurses' over/underconfidence varied significantly with the degree of task difficulty. More research is required to identify strategies to minimize such cognitive biases.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 39
    Publikationsdatum: 2012-10-06
    Beschreibung: Background: Clinical Bioinformatics is currently growing and is based on the integration of clinical and omics data aiming at the development of personalized medicine. Thus the introduction of novel technologies able to investigate the relationship among clinical states and biological machineries may help the development of this field. For instance the Affymetrix DMET platform (drug metabolism enzymes and transporters) is able to study the relationship among the variation of the genome of patients and drug metabolism, detecting SNPs (Single Nucleotide Polymorphism) ongenes related to drug metabolism. This may allow for instance to find genetic variants in patients which present different drug responses, in pharmacogenomics and clinical studies. Despite this, there is currently a lack in the development of open-source algorithms and tools for the analysis of DMET data. Existing software tools for DMET data generally allow only the preprocessing of binary data (e.g. the DMET-Console provided by Affymetrix) and simple data analysis operations, but do not allow to test the association of the presence of SNPs with the response to drugs. Results: We developed DMET-Analyzer a tool for the automatic association analysis among the variation of the patient genomes and the clinical conditions of patients, i.e. the different response to drugs. The proposed system allows: (i) to automatize the workflow of analysis of DMET-SNP data avoiding the use of multiple tools; (ii) the automatic annotation of DMET-SNP data and the search in existing databases of SNPs (e.g. dbSNP), (iii) the association of SNP with pathway through the search in PharmaGKB, a major knowledge base for pharmacogenomic studies. DMET-Analyzer has a simple graphical user interface that allows users (doctors/biologists) to upload and analyse DMET files produced by Affymetrix DMET-Console in an interactive way. The effectiveness and easy use of DMET Analyzer is demonstrated through different case studies regarding the analysis of clinical datasets produced in the University Hospital of Catanzaro, Italy. Conclusion: DMET Analyzer is a novel tool able to automatically analyse data produced by the DMET-platform in case-control association studies. Using such tool user may avoid wasting time in the manual execution of multiple statistical tests avoiding possible errors and reducing the amount of time needed for a whole experiment. Moreover annotations and the direct link to external databases may increase the biological knowledge extracted. The system is freely available for academic purposes at: https://sourceforge.net/projects/dmetanalyzer/files/
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 40
    Publikationsdatum: 2012-08-01
    Beschreibung: Background: The Hedgehog Signaling Pathway is one of signaling pathways that are very important toembryonic development. The participation of inhibitors in the Hedgehog Signal Pathway cancontrol cell growth and death, and searching novel inhibitors to the functioning of thepathway are in a great demand. As the matter of fact, effective inhibitors could provideefficient therapies for a wide range of malignancies, and targeting such pathway in cellsrepresents a promising new paradigm for cell growth and death control. Current researchmainly focuses on the syntheses of the inhibitors of cyclopamine derivatives, which bindspecifically to the Smo protein, and can be used for cancer therapy. While quantitativelystructure-activity relationship (QSAR) studies have been performed for these compounds among different cell lines, none of them have achieved acceptable results in the prediction ofactivity values of new compounds. In this study, we proposed a novel collaborative QSARmodel for inhibitors of the Hedgehog Signaling Pathway by integration the information frommultiple cell lines. Such a model is expected to substantially improve the QSAR ability fromsingle cell lines, and provide useful clues in developing clinically effective inhibitors andmodifications of parent lead compounds for target on the Hedgehog Signaling Pathway. Results: In this study, we have presented: (1) a collaborative QSAR model, which is used to integrateinformation among multiple cell lines to boost the QSAR results, rather than only a singlecell line QSAR modeling. Our experiments have shown that the performance of our model issignificantly better than single cell line QSAR methods; and (2) an efficient feature selectionstrategy under such collaborative environment, which can derive the commonly importantfeatures related to the entire given cell lines, while simultaneously showing their specificcontributions to a specific cell-line. Based on feature selection results, we have proposedseveral possible chemical modifications to improve the inhibitor affinity towards multipletargets in the Hedgehog Signaling Pathway. Conclusions: Our model with the feature selection strategy presented here is efficient, robust, and flexible,and can be easily extended to model large-scale multiple cell line/QSAR data. The data andscripts for collaborative QSAR modeling are available in the Additional file 1.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 41
    Publikationsdatum: 2012-08-01
    Beschreibung: Background: Functional brain images such as Single-Photon Emission Computed Tomography (SPECT) and PositronEmission Tomography (PET) have been widely used to guide the clinicians in the Alzheimer's Disease (AD)diagnosis. However, the subjectivity involved in their evaluation has favoured the development of ComputerAided Diagnosis (CAD) Systems. Methods: It is proposed a novel combination of feature extraction techniques to improve the diagnosis of AD. Firstly,Regions of Interest (ROIs) are selected by means of a t-test carried out on 3D Normalised Mean SquareError (NMSE) features restricted to be located within a predefined brain activation mask. In order to addressthe small sample-size problem, the dimension of the feature space was further reduced by: Large MarginNearest Neighbours using a rectangular matrix (LMNN-RECT), Principal Component Analysis (PCA) orPartial Least Squares (PLS) (the two latter also analysed with a LMNN transformation). Regarding theclassifiers, kernel Support Vector Machines (SVMs) and LMNN using Euclidean, Mahalanobis andEnergy-based metrics were compared. Results: Several experiments were conducted in order to evaluate the proposed LMNN-based feature extractionalgorithms and its benefits as: i) linear transformation of the PLS or PCA reduced data, ii) feature reductiontechnique, and iii) classifier (with Euclidean, Mahalanobis or Energy-based methodology). The system wasevaluated by means of k-fold cross-validation yielding accuracy, sensitivity and specificity values of92.78%, 91.07% and 95.12% (for SPECT) and 90.67%, 88% and 93.33% (for PET), respectively, when aNMSE-PLS-LMNN feature extraction method was used in combination with a SVM classifier, thusoutperforming recently reported baseline methods. Conclusions: All the proposed methods turned out to be a valid solution for the presented problem. One of the advances isthe robustness of the LMNN algorithm that not only provides higher separation rate between the classes butit also makes (in combination with NMSE and PLS) this rate variation more stable. In addition, theirgeneralization ability is another advance since several experiments were performed on two image modalities(SPECT and PET).
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 42
    Publikationsdatum: 2012-08-01
    Beschreibung: Background: Semantic Web technology can considerably catalyze translational genetics and genomicsresearch in medicine, where the interchange of information between basic research andclinical levels becomes crucial. This exchange involves mapping abstract phenotypedescriptions from research resources, such as knowledge databases and catalogs, tounstructured datasets produced through experimental methods and clinical practice. This isespecially true for the construction of mutation databases. This paper presents a way ofharmonizing abstract phenotype descriptions with patient data from clinical practice, andquerying this dataset about relationships between phenotypes and genetic variants, atdifferent levels of abstraction. Methods: Due to the current availability of ontological and terminological resources that have alreadyreached some consensus in biomedicine, a reuse-based ontology engineering approach wasfollowed. The proposed approach uses the Ontology Web Language (OWL) to represent thephenotype ontology and the patient model, the Semantic Web Rule Language (SWRL) tobridge the gap between phenotype descriptions and clinical data, and the Semantic QueryWeb Rule Language (SQWRL) to query relevant phenotype-genotype bidirectionalrelationships. The work tests the use of semantic web technology in the biomedical researchdomain named cerebrotendinous xanthomatosis (CTX), using a real dataset and ontologies. Results: A framework to query relevant phenotype-genotype bidirectional relationships is provided.Phenotype descriptions and patient data were harmonized by defining 28 Horn-like rules interms of the OWL concepts. In total, 24 patterns of SWQRL queries were designed followingthe initial list of competency questions. As the approach is based on OWL, the semantic ofthe framework adapts the standard logical model of an open world assumption. Conclusions: This work demonstrates how semantic web technologies can be used to support flexiblerepresentation and computational inference mechanisms required to query patient datasets atdifferent levels of abstraction. The open world assumption is especially good for describingonly partially known phenotype-genotype relationships, in a way that is easily extensible. Infuture, this type of approach could offer researchers a valuable resource to infer new datafrom patient data for statistical analysis in translational research. In conclusion, phenotypedescription formalization and mapping to clinical data are two key elements for interchangingknowledge between basic and clinical research.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 43
    Publikationsdatum: 2012-08-02
    Beschreibung: Background: Accurate gene structure annotation is a fundamental but somewhat elusive goal of genome projects, as witnessed by the fact that (model) genomes typically undergo several cycles of re-annotation.In many cases, it is not only different versions of annotations that need to be compared but also different sources of annotation of the same genome, derived from distinct gene prediction workflows.Such comparisons are of interest to annotation providers, prediction software developers, and end-users, who all need to assess what is common and what is different among distinct annotation sources.We developed ParsEval, a software application for pairwise comparison of sets of gene structure annotations.ParsEval calculates several statistics that highlight the similarities and differences between the two sets of annotations provided.These statistics are presented in an aggregate summary report, with additional details provided as individual reports specific to non-overlappinng, gene-model-centric genomic loci.Genome browser styled graphics embedded in these reports help visualize the genomic context of the annotations.Output from ParsEval is both easily read and parsed, enabling systematic identification of problematic gene models for subsequent focused analysis. Results: ParsEval is capable of analyzing annotations for large eukaryotic genomes on typical desktop or laptop hardware.In comparison to existing methods, ParsEval exhibits a considerable performance improvement, both in terms of runtime and memory consumption.Reports from ParsEval can provide relevant biological insights into the gene structure annotations being compared. Conclusions: Implemented in C, ParseEval provides the quickest and most feature-rich solution for genome annotation comparison to date.The source code is freely available (under an ISC license) at http://parseval.sourceforge.net/.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 44
    Publikationsdatum: 2012-08-02
    Beschreibung: Background: Lifestyle-related diseases represented by metabolic syndrome develop as results of complex interaction. By using health check-up data from two large studies collected during a long-term follow-up, we searched for risk factors associated with the development of metabolic syndrome. Methods: In our original study, we selected 77 case subjects who developed metabolic syndrome during the follow-up and 152 healthy control subjects who were free of lifestyle-related risk components from among 1803 Japanese male employees. In a replication study, we selected 2196 case subjects and 2196 control subjects from among 31343 other Japanese male employees. By means of a bioinformatics approach using a fuzzy neural network (FNN), we searched any significant combinations that are associated with MetS. To ensure that the risk combination selected by FNN analysis was statistically reliable, we performed logistic regression analysis including adjustment. Results: We selected a combination of an elevated level of gamma-glutamyltranspeptidase (gamma-GTP, GGTP) and an elevated white blood cell (WBC) count as the most significant combination of risk factors for the development of metabolic syndrome. The FNN also identified the same tendency in a replication study. The clinical characteristics of gamma-GTP level and WBC count were statistically significant even after adjustment, confirming that the results obtained from the fuzzy neural network are reasonable. Correlation ratio showed that an elevated level of gamma-GTP is associated with habitual drinking of alcohol and a high WBC count is associated with habitual smoking. Conclusions: This result obtained by fuzzy neural network analysis of health check-up data from large long-term studies can be useful in providing a personalized novel diagnostic and therapeutic method involving the gamma-GTP level and the WBC count.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 45
    Publikationsdatum: 2012-08-02
    Beschreibung: Background: Primary health care in industrialized countries faces major challenges due to demographic changes, an increasing prevalence of chronic diseases and a shortage of primary care physicians. One approach to counteract these developments might be to reduce primary care physicians' workload supported by the use of health information technology (HIT) and non-physician practice staff. In 2009, the U.S. Commonwealth Fund conducted an international survey of primary care physicians which the present secondary descriptive analysis is based on. The aim of this analysis was twofold: First, to explore to what extend German primary care physicians already get support by HIT and non-physician practice staff, and second, to show possible future perspectives. Methods: The CWF questionnaire was sent to a representative random sample of 1,500 primary care physicians all over Germany. The data was descriptively analyzed. Group comparisons regarding differences in gender and age groups were made by means of Chi Square Tests for categorical variables. An alpha-level of p 〈 0.05 was used for statistical significance. Results: Altogether 715 primary care physicians answered the questionnaire (response rate 49%). Seventy percent of the physicians use electronic medical records. Technical features such as electronic ordering and access to laboratory parameters are mainly used. However, the majority does not routinely use technical functions for drug prescribing, reminder-systems for guideline-based interventions or recall of patients. Six percent of surveyed physicians are able to transfer prescriptions electronically to a pharmacy, 1% use email communication with patients regularly. Seventy-two percent of primary care physicians get support by non-physician practice staff in patient care, mostly in administrative tasks or routine preventive services. One fourth of physicians is supported in telephone calls to the patient or in patient education and counseling. Conclusion: Within this sample the majority of primary care physicians get support by HIT and non-physician practice staff in their daily work. However, the potential has not yet been fully used. Supportive technical functions like electronic alarm functions for medication or electronic prescribing should be improved technically and more adapted to physicians' needs. To warrant pro-active health care, recall and reminder systems should get refined to encourage their use. Adequately qualified non-physician practice staff could play a more active role in patient care. Reimbursement should not only be linked to doctors', but also to non-physician practice staff services.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 46
    Publikationsdatum: 2012-08-03
    Beschreibung: Background: Web-based synteny visualization tools are important for sharing data and revealing patterns of complicated genome conservation and rearrangements. Such tools should allow biologists to upload genomic data for their own analysis. This requirement is critical because individual biologists are generating large amounts of genomic sequences that quickly overwhelm any centralized web resources to collect and display all those data. Recently, we published a web-based synteny viewer, GSV, which was designed to satisfy the above requirement. However, GSV can only compare two genomes at a given time. Extending the functionality of GSV to visualize multiple genomes is important to meet the increasing demand of the research community. Results: We have developed a multi-Genome Synteny Viewer (mGSV). Similar to GSV, mGSV is a web-based tool that allows users to upload their own genomic data files for visualization. Multiple genomes can be presented in a single integrated view with an enhanced user interface. Users can navigate through all the selected genomes in either pairwise or multiple viewing mode to examine conserved genomic regions as well as the accompanying genome annotations. Besides serving users who manually interact with the web server, mGSV also provides Web Services for machine-to-machine communication to accept data sent by other remote resources. The entire mGSV package can also be downloaded for easy local installation. Conclusions: mGSV significantly enhances the original functionalities of GSV. A web server hosting mGSV is provided at http://cas-bioinfo.cas.unt.edu/mgsv.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 47
    Publikationsdatum: 2012-08-03
    Beschreibung: Background: Increasingly, biologists and biochemists use computational tools to design experiments to probe the function of proteins and/or to engineer them for a variety of different purposes. The most effective strategies rely on the knowledge of the three-dimensional structure of the protein of interest. However it is often the case that an experimental structure is not available and that models of different quality are used instead. On the other hand, the relationship between the quality of a model and its appropriate use is not easy to derive in general, and so far it has been analyzed in detail only for specific application Results: This paper describes a database and related software tools that allow testing of a given structure based methods on models of a protein representing different levels of accuracy. The comparison of the results of a computational experiment on the experimental structure and on a set of its decoy models will allow developers and users to assess which is the specific threshold of accuracy required to perform the task effectively. Conclusions: The ModelDB server automatically builds decoy models of different accuracy for a given protein of known structure and provides a set of useful tools for their analysis. Pre-computed data for a non-redundant set of deposited protein structures are available for analysis and download in the ModelDB database.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 48
    Publikationsdatum: 2012-07-03
    Beschreibung: Background: Protein-protein, cell-signaling, metabolic, and transcriptional interaction networks are useful for identifying connections between lists of experimentally identified genes/proteins. However, besides physical or co-expression interactions there are many ways in which pairs of genes, or their protein products, can be associated. By systematically incorporating knowledge on shared properties of genes from diverse sources to build functional association networks (FANs), researchers may be able to identify additional functional interactions between groups of genes that are not readily apparent. Results: Genes2FANs is a web based tool and a database that utilizes 14 carefully constructed FANs and a large-scale protein-protein interaction (PPI) network to build subnetworks that connect input lists of human and mouse genes. The FANs are created from mammalian gene set libraries where mouse genes are converted to their human orthologs. The tool takes as input a list of human or mouse Entrez gene symbols to produce a subnetwork and a ranked list of intermediate genes that are used to connect the query input list. In addition, users can enter any PubMed search term and then the system automatically converts the returned results to gene lists using GeneRIF. This gene list is then used as input to generate a subnetwork from the user's PubMed query. As a case study, we applied Genes2FANs to connect disease genes from 90 well studied disorders. We find an inverse correlation between the counts of links connecting disease genes through PPI and links connecting diseases genes through FANs separating diseases into two categories. Conclusions: Genes2FANs is a useful tool for interpreting the relationships between gene/protein lists in the context of their various functions and networks. Combining functional association interactions with physical PPIs can be useful for revealing new biology and help form hypotheses for further experimentation. Our finding that disease genes in many cancers are mostly connected through PPIs whereas other complex diseases, such as autism and type-2 diabetes, are mostly connected through FANs without PPIs, can guide better strategies for disease gene discovery. Genes2FANs is available at: http://actin.pharm.mssm.edu/genes2FANs.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 49
    Publikationsdatum: 2012-07-03
    Beschreibung: Background: Due to hybridization events in evolution, studying two different genes of a set of species may yieldtwo related but different phylogenetic trees for the set of species. In this case, we want to combine the two phylogenetic trees into a hybridization network with the fewest hybridization events. This leads to three computational problems, namely, the problem of computing the minimum size of a hybridization network, the problem of constructing one minimum hybridization network, and the problem of enumerating a representative set of minimum hybridization networks. The previously best software tools for these problems (namely, Chen and Wang's HybridNet and Albrecht et al.'s Dendroscope 3) run very slowly for large instances that cannot be reduced to relatively small instances. Indeed, when the minimum size of a hybridization network of two given trees are larger than 23 and the problem for the trees cannot be reduced to relatively smaller independent subproblems, then HybridNet almost always takes longer than 1 day and Dendroscope 3 often fails to complete. Thus, a faster software tool for the problems is in need. Results: We develop a software tool in ANSI C, named FastHN, for the following problems: Computing the minimum size of a hybridization network, constructing one minimum hybridization network, and enumerating a representative set of minimum hybridization networks. We obtain FastHN by refining HybridNet with three ideas. The first idea is to preprocess the input trees so that the trees become smaller or the problem becomes to solve two or more relatively smaller independent subproblems. The second idea is to use a fast algorithm for computing rSPR distance of two given phylognetic trees to cut more branches of the search tree in the exhaustive-search stage of the algorithm. The third idea is that during the exhaustive-search stage of the algorithm, we find two sibling leaves in one of the two forests (obtained from the given trees by cutting some edges) such that they are as far as possible in the other forest. As the result, FastHN always runs much faster than HybridNet. Unlike Dendroscope 3, FastHN is a single-threaded program. Despite this disadvantage, our experimental data shows that FastHN runs substantially faster than the multi-threaded Dendroscope 3 on a PC with multiple cores. Indeed, FastHN can finish within 16 minutes (on average on a Windows-7 (x64) desktop PC with i7-2600 CPU) even if the minimum size of a hybridization network of two given trees is about 25, the trees each have 100 leaves, and the problem for the input trees cannot be reduced to two or more independent subproblems via cluster reductions. It is also worth mentioning that like HybridNet, FastHN does not use much memory (indeed, the amount of memory is at most quadratic in the input size). In contrast, Dendroscope 3 uses a huge amount of memory. Executables of FastHN for Windows XP (x86), Windows 7 (x64), Linux, and Mac OS are available. Conclusions: For both biological datasets and simulated datasets, our experimental results show that FastHN runs substantially faster than HybridNet and Dendroscope 3. The superiority of FastHN in speed over the previous tools becomes more significant as the hybridization number becomes larger. In addition, FastHN uses much less memory than Dendroscope 3 and uses the same amount of memory as HybridNet.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 50
    Publikationsdatum: 2012-08-22
    Beschreibung: Background: Ongoing innovation in phylogenetics and evolutionary biology has been accompanied by a proliferation of software tools, data formats, analytical techniques and web servers. This brings with it the challenge of integrating phylogenetic and other related biological data found in a wide variety of formats, and underlines the need for reusable software that can read, manipulate and transform this information into the various forms required to build computational pipelines. Results: We built a Python software library for working with phylogenetic data that is tightly integrated with Biopython, a broad-ranging toolkit for computational biology. Our library, Bio.Phylo, is highly interoperable with existing libraries, tools and standards, and is capable of parsing common file formats for phylogenetic trees, performing basic transformations and manipulations, attaching rich annotations, and visualizing trees. We unified the modules for working with the standard file formats Newick, NEXUS and phyloXML behind a consistent and simple API, providing a common set of functionality independent of the data source. Conclusions: Bio.Phylo meets a growing need in bioinformatics for working with heterogeneous types of phylogenetic data. By supporting interoperability with multiple file formats and leveraging existing Biopython features, this library simplifies the construction of phylogenetic workflows. We also provide examples of the benefits of building a community around a shared open-source project. Bio.Phylo is included with Biopython, available through the Biopython website, http://biopython.org.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 51
    Publikationsdatum: 2012-08-22
    Beschreibung: Background: The increased use of multi-locus data sets for phylogenetic reconstruction has increased the need todetermine whether a set of gene trees significantly deviate from the phylogenetic patterns of other genes.Such unusual gene trees may have been influenced by other evolutionary processes such as selection, geneduplication, or horizontal gene transfer. Results: Motivated by this problem we propose a nonparametric goodness-of-fit test for two empirical distributionsof gene trees, and we developed the software GeneOut to estimate a p-value for the test. Our approachmaps trees into a multi-dimensional vector space and then applies support vector machines (SVMs) tomeasure the separation between two sets of pre-defined trees. We use a permutation test to assess thesignificance of the SVM separation. To demonstrate the performance of GeneOut, we applied it to thecomparison of gene trees simulated within different species trees across a range of species tree depths.Applied directly to sets of simulated gene trees with large sample sizes, GeneOut was able to detect verysmall differences between two set of gene trees generated under different species trees. Our statistical testcan also include tree reconstruction into its test framework through a variety of phylogenetic optimalitycriteria. When applied to DNA sequence data simulated from different sets of gene trees, results in the formof receiver operating characteristic (ROC) curves indicated that GeneOut performed well in the detectionof differences between sets of trees with different distributions in a multi-dimensional space. Furthermore, itcontrolled false positive and false negative rates very well, indicating a high degree of accuracy. Conclusions: The non-parametric nature of our statistical test provides fast and efficient analyses, and makes it anapplicable test for any scenario where evolutionary or other factors can lead to trees with differentmulti-dimensional distributions. The software GeneOut is freely available under the GNU public license.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 52
    Publikationsdatum: 2012-08-23
    Beschreibung: Background: Histone deacetylase (HDAC) is a novel target for the treatment of cancer and it can be classified into three classes, i.e., classes I, II, and IV. The inhibitors selectively targeting individual HDAC have been proved to be the better candidate antitumor drugs. To screen selective HDAC inhibitors, several proteochemometric (PCM) models based on different combinations of three kinds of protein descriptors, two kinds of ligand descriptors and multiplication cross-terms were constructed in our study. Results: The results show that structure similarity descriptors are better than sequence similarity descriptors and geometry descriptors in the characterization of HDACs. Furthermore, the predictive ability was not improved by introducing the cross-terms in our models. Finally, a best PCM model based on protein structure similarity descriptors and 32-dimensional general descriptors was derived (R2 = 0.9897, Qtest2 = 0.7542), which shows a powerful ability to screen selective HDAC inhibitors. Conclusions: Our best model not only predict the activities of inhibitors for each HDAC isoform, but also screen and distinguish class-selective inhibitors and even more isoform-selective inhibitors, thus it provides a potential way to discover or design novel candidate antitumor drugs with reduced side effect.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 53
    Publikationsdatum: 2012-08-23
    Beschreibung: Background: A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. Results: We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9 % and recall = 70.5 %) compared to a popular dictionary based approach (precision = 97.5 % and recall = 54.3 %) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central's full text articles annotated with scientific names, the precision and recall values are 98.5 % and 96.2 % respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Additionally, we present the comparison results of various machine learning algorithms on our annotated corpus. Naive Bayes and Maximum Entropy with Generalized Iterative Scaling (GIS) parameter estimation are the top two performing algorithms. Conclusions: We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at http://namefinding.ubio.org
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 54
    Publikationsdatum: 2012-10-16
    Beschreibung: Background: Plants are important as foods, pharmaceuticals, biorenewable chemicals, fuel resources, bioremediation tools and general tools for recombinant technology. The study of plant biological pathways is advanced by easy access to integrated data sources. Today, various plant data sources are scattered throughout the web, making it increasingly complicated to build comprehensive datasets. Results: MetNet Online is a web-based portal that provides access to a regulatory and metabolic plant pathway database. The database and portal integrate Arabidopsis, soybean (Glycine max) and grapevine (Vitis vinifera) data. Pathways are enriched with known or predicted information on sub cellular location. MetNet Online enables pathways, interactions and entities to be browsed or searched by multiple categories such as sub cellular compartment, pathway ontology, and GO term. In addition to this, the "My MetNet" feature allows registered users to bookmark content and track, import and export customized lists of entities. Users can also construct custom networks using existing pathways and/or interactions as building blocks. Conclusion: The site can be reached at http://www.metnetonline.org. Extensive video tutorials on how to use the site are available through http://www.metnetonline.org/tutorial/.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 55
    Publikationsdatum: 2012-10-20
    Beschreibung: Background: Dysregulation of imprinted genes, which are expressed in a parent-of-origin-specific manner, plays an important role in various human diseases, such as cancer and behavioral disorder. To date, however, fewer than 100 imprinted genes have been identified in the human genome. The recent availability of high-throughput technology makes it possible to have large-scale prediction of imprinted genes. Here we propose a Bayesian model (dsPIG) to predict imprinted genes on the basis of allelic expression observed in mRNA-Seq data of independent human tissues. Results: Our model (dsPIG) was capable of identifying imprinted genes with high sensitivity and specificity and a low false discovery rate when the number of sequenced tissue samples was fairly large, according to simulations. By applying dsPIG to the mRNA-Seq data, we predicted 94 imprinted genes in 20 cerebellum samples and 57 imprinted genes in 9 diverse tissue samples with expected low false discovery rates. We also assessed dsPIG using previously validated imprinted and non-imprinted genes. With simulations, we further analyzed how imbalanced allelic expression of non-imprinted genes or different minor allele frequencies affected the predictions of dsPIG. Interestingly, we found that, among biallelically expressed genes, at least 18 genes expressed significantly more transcripts from one allele than the other among different individuals and tissues. Conclusion: With the prevalence of the mRNA-Seq technology, dsPIG has become a useful tool for analysis of allelic expression and large-scale prediction of imprinted genes. For ease of use, we have set up a web service and also provided an R package for dsPIG at http://www.shoudanliang.com/dsPIG/.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 56
    Publikationsdatum: 2012-10-20
    Beschreibung: The development of genomic tests is one of the most significant technological advances in medical testing in recent decades. As these tests become increasingly available, so does the need for a pragmatic framework to evaluate the evidence base and evidence gaps in order to facilitate informed decision-making. In this article we describe such a framework that can provide a common language and benchmarks for different stakeholders of genomic testing. Each stakeholder can use this framework to specify their respective thresholds for decision-making, depending on their perspective and particular needs. This framework is applicable across a broad range of test applications and can be helpful in the application and communication of a regulatory science for genomic testing. Our framework builds upon existing work and incorporates principles familiar to researchers involved in medical testing (both diagnostic and prognostic) generally, as well as those involved in genomic testing. This framework is organized around six phases in the development of genomic tests beginning with marker identification and ending with population impact, and highlights the important knowledge gaps that need to be filled in establishing the clinical relevance of a test. Our framework focuses on the clinical appropriateness of the four main dimensions of test research questions (population/setting, intervention/index test, comparators/reference test, and outcomes) rather than prescribing a hierarchy of study designs that should be used to address each phase.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 57
    Publikationsdatum: 2012-12-02
    Beschreibung: Background: Few educational resources have been developed to inform patients' renal replacement therapy (RRT) selection decisions. Patients progressing toward end stage renal disease (ESRD) must decide among multiple treatment options with varying characteristics. Complex information about treatments must be adequately conveyed to patients with different educational backgrounds and informational needs. Decisions about treatment options also require family input, as families often participate in patients' treatment and support patients' decisions. We describe the development, design, and preliminary evaluation of an informational, evidence-based, and patient-and family-centered decision aid for patients with ESRD and varying levels of health literacy, health numeracy, and cognitive function. Methods: We designed a decision aid comprising a complementary video and informational handbook. We based our development process on data previously obtained from qualitative focus groups and systematic literature reviews. We simultaneously developed the video and handbook in "stages." For the video, stages included (1) directed interviews with culturally appropriate patients and families and preliminary script development, (2) video production, and (3) screening the video with patients and their families. For the handbook, stages comprised (1) preliminary content design, (2) a mixed-methods pilot study among diverse patients to assess comprehension of handbook material, and (3) screening the handbook with patients and their families. Results: The video and handbook both addressed potential benefits and trade-offs of treatment selections. The 50-minute video consisted of demographically diverse patients and their families describing their positive and negative experiences with selecting a treatment option. The video also incorporated health professionals' testimonials regarding various considerations that might influence patients' and families' treatment selections. The handbook was comprised of written words, pictures of patients and health care providers, and diagrams describing the findings and quality of scientific studies comparing treatments. The handbook text was written at a 4th to 6th grade reading level. Pilot study results demonstrated that a majority of patients could understand information presented in the handbook. Patient and families screening the nearly completed video and handbook reviewed the materials favorably. Conclusions: This rigorously designed decision aid may help patients and families make informed decisions about their treatment options for RRT that are well aligned with their values.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 58
    Publikationsdatum: 2012-08-28
    Beschreibung: Modern analytical methods in biology and chemistry useseparation techniques coupled to sensitive detectors, such as gaschromatography-mass spectrometry (GC-MS) and liquid chromatography-massspectrometry (LC-MS). These hyphenated methods provide high-dimensionaldata. Comparing such data manually to find corresponding signals is a laborioustask, as each experiment usually consists of thousands of individual scans, eachcontaining hundreds or even thousands of distinct signals.In order to allow for successful identification of metabolites or proteinswithin such data, especially in the context of metabolomics and proteomics, anaccurate alignment and matching of corresponding features between two or moreexperiments is required. Such a matching algorithm should capture fluctuationsin the chromatographic system which lead to non-linear distortions on the timeaxis, as well as systematic changes in recorded intensities.Many different algorithms for the retention time alignment of GC-MS and LC-MSdata have been proposed and published, but all of them focus either on aligningpreviously extracted peak features or on aligning and comparing the complete rawdata containing all available features. Results: In this paper we introduce two algorithms for retentiontime alignment of multiple GC-MS datasets: multiple alignment bybidirectional best hits peak assignment and cluster extension (BiPACE) andcenter-star multiple alignment by pairwise partitioned dynamic time warping(CeMAPP-DTW). We show how the similarity-based peak group matchingmethod BiPACE may be used for multiple alignment calculation individually and how it can be usedas a preprocessing step for the pairwise alignments performed by CeMAPP-DTW. We evaluate thealgorithms individually and in combination on a previously published small GC-MS dataset studying the Leishmania parasite and on a larger GC-MS dataset studying grains of wheat (Triticum aestivum). Conclusions: We have shown that BiPACE achieves very high precision and recall anda very low number of false positive peak assignments on both evaluation datasets. CeMAPP-DTW finds a high number of true positives when executed on its own,but achieves even better results when BiPACE is used to constrain its search space. The source code of both algorithms is included in the OpenSource software framework Maltcms, which is available from http://maltcms.sf.net. The evaluation scripts of the present study are available from the same source.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 59
    Publikationsdatum: 2012-08-30
    Beschreibung: Background: In bio-medicine, exploratory studies and hypothesis generation often begin with researching existing literature to identify a set of factors and their association with diseases, phenotypes, or biological processes. Many scientists are overwhelmed by the sheer volume of literature on a disease when they plan to generate a new hypothesis or study a biological phenomenon. The situation is even worse for junior investigators who often find it difficult to formulate new hypotheses or, more importantly, corroborate if their hypothesis is consistent with existing literature. It is a daunting task to be abreast with so much being published and also remember all combinations of direct and indirect associations. Fortunately there is a growing trend of using literature mining and knowledge discovery tools in biomedical research. However, there is still a large gap between the huge amount of effort and resources invested in disease research and the little effort in harvesting the published knowledge. The proposed hypothesis generation framework (HGF) finds "crisp semantic associations" among entities of interest - that is a step towards bridging such gaps. Methodology: The proposed HGF shares similar end goals like the SWAN [1] but are more holistic in nature and was designed and implemented using scalable and efficient computational models of disease-disease interaction. The integration of mapping ontologies with latent semantic analysis is critical in capturing domain specific direct and indirect "crisp" associations, and making assertions about entities (such as disease X is associated with a set of factors Z). Results: Pilot studies were performed using two diseases. A comparative analysis of the computed "associations" and "assertions" with curated expert knowledge was performed to validate the results. It was observed that the HGF is able to capture "crisp" direct and indirect associations, and provide knowledge discovery on demand. Conclusions: The proposed framework is fast, efficient, and robust in generating new hypotheses to identify factors associated with a disease. A full integrated Web service application is being developed for wide dissemination of the HGF. A large-scale study by the domain experts and associated researchers is underway to validate the associations and assertions computed by the HGF.
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 60
    Publikationsdatum: 2012-09-04
    Beschreibung: Background: The U.S. Centers for Medicare and Medicaid Services established the Electronic Health Record (EHR) Incentive Program in 2009 to stimulate the adoption of EHRs. One component of the program requires eligible providers to implement clinical decision support (CDS) interventions that can improve performance on one or more quality measures pre-selected for each specialty. Because the unique decision-making challenges and existing HIT capabilities vary widely across specialties, the development of meaningful objectives for CDS within such programs must be supported by deliberative analysis.DesignWe developed a conceptual framework and protocol that combines evidence review with expert opinion to elicit clinically meaningful objectives for CDS directly from specialists. The framework links objectives for CDS to specialty-specific performance gaps while ensuring that a workable set of CDS opportunities are available to providers to address each performance gap. Performance gaps may include those with well-established quality measures but also priorities identified by specialists based on their clinical experience. Moreover, objectives are not constrained to performance gaps with existing CDS technologies, but rather may include those for which CDS tools might reasonably be expected to be developed in the near term, for example, by the beginning of Stage 3 of the EHR Incentive program. The protocol uses a modified Delphi expert panel process to elicit and prioritize CDS meaningful use objectives. Experts first rate the importance of performance gaps, beginning with a candidate list generated through an environmental scan and supplemented through nominations by panelists. For the highest priority performance gaps, panelists then rate the extent to which existing or future CDS interventions, characterized jointly as "CDS opportunities," might impact each performance gap and the extent to which each CDS opportunity is compatible with specialists' clinical workflows. The protocol was tested by expert panels representing four clinical specialties: oncology, orthopedic surgery, interventional cardiology, and pediatrics.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 61
    Publikationsdatum: 2012-08-29
    Beschreibung: Background: Most of the current proteomic researches focus on proteome alteration due to pathological disorders (i.e.: colorectal cancer) rather than normal healthy state when mentioning colon. As a result, there are lacks of information regarding normal whole tissue- colon proteome. Results: We report here a detailed murine (mouse) whole tissue- colon protein reference dataset composed of 1237 confident protein (FDR 〈 2) with comprehensive insight on its peptide properties, cellular and subcellular localization, functional network GO annotation analysis, and its relative abundances. The presented dataset includes wide spectra of pI and Mw ranged from 3--12 and 4--600 KDa, respectively. Gravy index scoring predicted 19.5% membranous and 80.5% globularly located proteins. GO hierarchies and functional network analysis illustrated proteins function together with their relevance and implication of several candidates in malignancy such as Mitogen- activated protein kinase (Mapk8, 9) in colorectal cancer, Fibroblast growth factor receptor (Fgfr 2), Glutathione S-transferase (Gstp1) in prostate cancer, and Cell division control protein (Cdc42), Ras-related protein (Rac1,2) in pancreatic cancer. Protein abundances calculated with 3 different algorithms (NSAF, PAF and emPAI) provide a relative quantification under normal condition as guidance. Conclusions: This highly confidence colon proteome catalogue will not only serve as a useful reference for further experiments characterizing differentially expressed proteins induced from diseased conditions, but also will aid in better understanding the ontology and functional absorptive mechanism of the colon as well.
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 62
    Publikationsdatum: 2012-08-29
    Beschreibung: Background: Biomedical processes can provide essential information about the (mal-) functioning of an organism and are thus frequently represented in biomedical terminologies and ontologies, including the GO Biological Process branch. These processes often need to be described and categorised in terms of their attributes, such as rates or regularities. The adequate representation of such process attributes has been a contentious issue in bio-ontologies recently; and domain ontologies have correspondingly developed ad hoc workarounds that compromise interoperability and logical consistency. Results: We present a design pattern for the representation of process attributes that is compatible with upper ontology frameworks such as BFO and BioTop. Our solution rests on two key tenets: firstly, that many of the sorts of process attributes which are biomedically interesting can be characterised by the ways that repeated parts of such processes constitute, in combination, an overall process; secondly, that entities for which a full logical definition can be assigned do not need to be treated as primitive within a formal ontology framework. We apply this approach to the challenge of modelling and automatically classifying examples of normal and abnormal rates and patterns of heart beating processes, and discuss the expressivity required in the underlying ontology representation language. We provide full definitions for process attributes at increasing levels of domain complexity. Conclusions: We show that a logical definition of process attributes is feasible, though limited by the expressivity of DL languages so that the creation of primitives is still necessary. This finding may endorse current formal upper-ontology frameworks as a way of ensuring consistency, interoperability and clarity.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 63
    Publikationsdatum: 2012-08-28
    Beschreibung: No description available
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 64
    Publikationsdatum: 2012-08-31
    Beschreibung: Background: The epidermal growth factor receptor (EGFR) signaling pathway and angiogenesis in brain cancer act as an engine for tumor initiation, expansion and response to therapy. Since the existing literature does not have any models that investigate the impact of both angiogenesis and molecular signaling pathways on treatment, we propose a novel multi-scale, agent-based computational model that includes both angiogenesis and EGFR modules to study the response of brain cancer under tyrosine kinase inhibitors (TKIs) treatment. Results: The novel angiogenesis module integrated into the agent-based tumor model is based on a set of reaction--diffusion equations that describe the spatio-temporal evolution of the distributions of micro-environmental factors such as glucose, oxygen, TGFalpha, VEGF and fibronectin. These molecular species regulate tumor growth during angiogenesis. Each tumor cell is equipped with an EGFR signaling pathway linked to a cell-cycle pathway to determine its phenotype. EGFR TKIs are delivered through the blood vessels of tumor microvasculature and the response to treatment is studied. Conclusions: Our simulations demonstrated that entire tumor growth profile is a collective behaviour of cells regulated by the EGFR signaling pathway and the cell cycle. We also found that angiogenesis has a dual effect under TKI treatment: on one hand, through neo-vasculature TKIs are delivered to decrease tumor invasion; on the other hand, the neo-vasculature can transport glucose and oxygen to tumor cells to maintain their metabolism, which results in an increase of cell survival rate in the late simulation stages.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 65
    Publikationsdatum: 2012-09-01
    Beschreibung: Background: Venous thromboembolism (VTE) causes morbidity and mortality in hospitalized patients, and regulators and payors are encouraging the use of systems to prevent them. Here, we examine the effect of a computerized clinical decision support (CDS) intervention implemented across a multi-hospital academic health system on VTE prophylaxis and events. Methods: The study included 223,062 inpatients admitted between April 2007 and May 2010, and used administrative and clinical data. The intervention was integrated into a commercial electronic health record (EHR) in an admission orderset used for all admissions. Three time periods were examined: baseline (period 1), and the time after implementation of the first CDS intervention (period 2) and a second iteration (period 3). Providers were prompted to accept or decline prophylaxis based on patient risk. Time series analyses examined the impact of the intervention on VTE prophylaxis during time periods two and three compared to baseline, and a simple pre-post design examined impact on VTE events and bleeds secondary to anticoagulation. VTE prophylaxis and events were also examined in a prespecified surgical subset of our population meeting the public reporting criteria defined by the Agency for Healthcare Research and Quality (AHRQ) Patient Safety Indicator (PSI). Results: Unadjusted analyses suggested that "recommended", "any", and "pharmacologic" prophylaxis increased from baseline to the last study period (27.1% to 51.9%, 56.7% to 78.1%, and 42.0% to 54.4% respectively; p 〈 0.01 for all comparisons). Results were significant across all hospitals and the health system overall. Interrupted time series analyses suggested that our intervention increased the use of "recommended" and "any" prophylaxis by 7.9% and 9.6% respectively from baseline to time period 2 (p 〈 0.01 for both comparisons); and 6.6% and 9.6% respectively from baseline to the combined time periods 2 and 3 (p 〈 0.01 for both comparisons). There were no significant changes in "pharmacologic" prophylaxis in the adjusted model. The overall percent of patients with VTE increased from baseline to the last study period (2.0% to 2.2%; p = 0.03), but an analysis excluding patients with VTE "present on admission" (POA) demonstrated no difference in events (1.3% to 1.3%; p = 0.80). Overall bleeds did not significantly change. An analysis examining VTE prophylaxis and events in a surgical subset of patients defined by the AHRQ PSI demonstrated increased "recommended", "any", and "pharmacologic" prophylaxis from baseline to the last study period (32.3% to 60.0%, 62.8% to 85.7%, and 47.9% to 63.3% respectively; p 〈 0.01 for all comparisons) as well as reduced VTE events (2.2% to 1.7%; p 〈 0.01). Conclusions: The CDS intervention was associated with an increase in "recommended" and "any" VTE prophylaxis across the multi-hospital academic health system. The intervention was also associated with increased VTE rates in the overall study population, but a subanalysis using only admissions with appropriate POA documentation suggested no change in VTE rates, and a prespecified analysis of a surgical subset of our sample as defined by the AHRQ PSI for public reporting purposes suggested reduced VTE. This intervention was created in a commonly used commercial EHR and is scalable across institutions with similar systems.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 66
    Publikationsdatum: 2012-09-05
    Beschreibung: Background: Next-generation sequencing technologies have become important tools for genome-wide studies. However, the quality scores that are assigned to each base have been shown to be inaccurate. If the quality scores are used in downstream analyses, these inaccuracies can have a significant impact on the results. Results: Here we present ReQON, a tool that recalibrates the base quality scores from an input BAM file of aligned sequencing data using logistic regression. ReQON also generates diagnostic plots showing the effectiveness of the recalibration. We show that ReQON produces quality scores that are both more accurate, in the sense that they more closely correspond to the probability of a sequencing error, and do a better job of discriminating between sequencing errors and non-errors than the original quality scores. We also compare ReQON to other available recalibration tools and show that ReQON is less biased and performs favorably in terms of quality score accuracy. Conclusion: ReQON is an open source software package, written in R and available through Bioconductor, for recalibrating base quality scores for next-generation sequencing data. ReQON produces a new BAM file with more accurate quality scores, which can improve the results of downstream analysis, and produces several diagnostic plots showing the effectiveness of the recalibration.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 67
    Publikationsdatum: 2012-09-05
    Beschreibung: Background: Meningitis means an inflammation of the meninges, or the membrane around the brain and spinal cord. It might be caused by a number of factors, but infectious meningitis is due to multiplication of fungal, viral or bacterial organisms. A number of studies showed that the diagnosis and treatment management of meningitis is a complex and common problem and a special attention is demanded. In this work, our attention primarily has focused on the process of making decisions in medical domain by using Fuzzy Cognitive Map, to model physicians-experts' behavior in the decision task related to this life threatening disease. Methods: Fuzzy cognitive mapping (FCM) is a method for analysing and depicting human perception of a given system. The method produces a conceptual model which is not limited by exact values and measurements, and thus is well suited to represent relatively unstructured knowledge and causalities expressed in imprecise forms. A team of doctors (physicians), comprising of four paediatricians, has been formed to define the number and types of sign/symptoms and other life-style related factors used in deciding the presence and absence of meningitis disease. The FCM model, consisting of 20 concepts nodes, has been designed by the team paediatricians after through deliberations. The 19 concepts are the symptoms and risks taken under consideration for the decision on disease. Results: The paediatricians were supplied with a form containing various input parameters to be filled at the time of diagnosing meningitis among infants and children. The paediatricians supplied back the cases of forty different children with age ranging from 2 months through 7 years. The physicians' decision was available for each of the cases as whether treated for meningitis disease or not. The physicians' opinions were used as "gold standard" for the model evaluation. The system predicted the outcome in all forty cases with an accuracy of 95%, thus showing its functionality and demonstrating that the use of the FCMs as dynamic models is reliable and good. Conclusions: This work elaborates the development of the knowledge based system, using the formalization of FCMs for meningitis diagnosis in children and infants. The results have shown that the suggested FCM-based tool gives a front-end decision on diagnosing meningitis and could be considered as helpful reference for physicians and patients.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 68
    Publikationsdatum: 2012-08-26
    Beschreibung: Background: Quantitative analysis of changes in dendritic spine morphology has become an interesting issue in contemporary neuroscience.However, the diversity in dendritic spines population might seriously influence the results of measurements in which their morphology is studied,the detection of differences in spine morphology between control and test group is often compromised by the number of dendritic spines taken for analysis. In order to estimate how severe is such an impact we have performed Monte Carlo simulations examining various experimental setups and statistical approaches. The confocal images of dendritic spines from hippocampal dissociated cultures have been used to create a set of variables exploited as the simulation resources. Results: The tabulated results of simulations are given, providing the number of dendritic spines required for the detection of hidden morphological differences between control and test group, in spine head-width, length and area. It turns out that this is the head-width among these three variables, where the changes are most easily detected. Simulation of changes occurring in a subpopulation of spines reveal the strong dependenceof detectability on the statistical approach applied. The analysis based on comparison of percentage of spines in subclasses is less sensitive thanthe direct comparison of relevant variables describing spines morphology. Conclusions: We evaluated the sampling aspect and effect of systematic morphological variation on detecting the differences in spine morphology.Provided results may serve as a guideline in selecting the number of samples to be studied in a planned experiment. Our simulations might be a step towards the development of a standardized method of quantitative comparison of dendritic spines morphology, in which different sources of errors are considered.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 69
    Publikationsdatum: 2012-09-04
    Beschreibung: Background: Food security is an issue that has come under renewed scrutiny amidst concerns that substantial yield increases in cereal crops are required to feed the world's booming population. Wheat is of fundamental importance in this regard being one of the three most important crops for both human consumption and livestock feed; however, increase in crop yields have not kept pace with the demands of a growing world population. In order to address this issue, plant breeders require new molecular tools to help them identify genes for important agronomic traits that can be introduced into elite varieties. Studies of the genome using next-generation sequencing enable the identification of molecular markers such as single nucleotide polymorphisms that may be used by breeders to identify and follow genes when breeding new varieties. The development and application of next-generation sequencing technologies has made the characterisation of SNP markers in wheat relatively cheap and straightforward. There is a growing need for the widespread dissemination of this information to plant breeders.DescriptionCerealsDB is an online resource containing a range of genomic datasets for wheat (Triticum aestivum) that will assist plant breeders and scientists to select the most appropriate markers for marker assisted selection. CerealsDB includes a database which currently contains in excess of 100,000 putative varietal SNPs, of which several thousand have been experimentally validated. In addition, CerealsDB contains databases for DArT markers and EST sequences, and links to a draft genome sequence for the wheat variety Chinese Spring. Conclusion: CerealsDB is an open access website that is rapidly becoming an invaluable resource within the wheat research and plant breeding communities.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 70
    Publikationsdatum: 2012-09-04
    Beschreibung: Background: One of the crucial steps in regulation of gene expression is the binding of transcription factor(s) to specific DNA sequences. Knowledge of the binding affinity and specificity at a structural level between transcription factors and their target sites has important implications in our understanding of the mechanism of gene regulation. Due to their unique functions and binding specificity, there is a need for a transcription factor-specific, structure-based database and corresponding web service to facilitate structural bioinformatics studies of transcription factor-DNA interactions, such as development of knowledge-based interaction potential, transcription factor-DNA docking, binding induced conformational changes, and the thermodynamics of protein-DNA interactions.DescriptionTFinDit is a relational database and a web search tool for studying transcription factor-DNA interactions. The database contains annotated transcription factor-DNA complex structures and related data, such as unbound protein structures, thermodynamic data, and binding sequences for the corresponding transcription factors in the complex structures. TFinDit also provides a user-friendly interface and allows users to either query individual entries or generate datasets through culling the database based on one or more search criteria. Conclusions: TFinDit is a specialized structural database with annotated transcription factor-DNA complex structures and other preprocessed data. We believe that this database/web service can facilitate the development and testing of TF-DNA interaction potentials and TF-DNA docking algorithms, and the study of protein-DNA recognition mechanisms.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 71
    Publikationsdatum: 2012-09-04
    Beschreibung: Background: Dual processing theory of human cognition postulates that reasoning and decision-making can be described as a function of both an intuitive, experiential, affective system (system I) and/or an analytical, deliberative (system II) processing system. To date no formal descriptive model of medical decision-making based on dual processing theory has been developed. Here we postulate such a model and apply it to a common clinical situation: whether treatment should be administered to the patient who may or may not have a disease. Methods: We developed a mathematical model in which we linked a recently proposed descriptive psychological model of cognition with the threshold model of medical decision-making and show how this approach can be used to better understand decision-making at the bedside and explain the widespread variation in treatments observed in clinical practice. Results: We show that physician's beliefs about whether to treat at higher (lower) probability levels compared to the prescriptive therapeutic thresholds obtained via system II processing is moderated by system I and the ratio of benefit and harms as evaluated by both system I and II. Under some conditions, the system I decision maker's threshold may dramatically drop below the expected utility threshold derived by system II. This can explain the overtreatment often seen in the contemporary practice. The opposite can also occur as in the situations where empirical evidence is considered unreliable, or when cognitive processes of decision-makers are biased through recent experience: the threshold will increase relative to the normative threshold value derived via system II using expected utility threshold. This inclination for the higher diagnostic certainty may, in turn, explain undertreatment that is also documented in the current medical practice. Conclusions: We have developed the first dual processing model of medical decision-making that has potential to enrich the current medical decision-making field, which is still to the large extent dominated by expected utility theory. The model also provides a platform for reconciling two groups of competing dual processing theories (parallel competitive with default-interventionalist theories).
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 72
    Publikationsdatum: 2012-09-06
    Beschreibung: Background: High-density oligonucleotide microarray is an appropriate technology for genomic analysis, and is particulary useful in the generation of transcriptional maps, ChIP-on-chip studies and re-sequencing of the genome.Transcriptome analysis of tiling microarray data facilitates the discovery of novel transcripts and the assessment of differential expression in diverse experimental conditions. Although new technologies such as next-generation sequencing have appeared, microarrays might still be useful for the study of small genomes or for the analysis of genomic regions with custom microarrays due to their lower price and good accuracy in expression quantification. Results: Here, we propose a novel wavelet-based method, named ZCL (zero-crossing lines), for the combined denoising and segmentation of tiling signals. The denoising is performed with the classical SUREshrink method and the detection of transcriptionally active regions is based on the computation of the Continuous Wavelet Transform (CWT). In particular, the detection of the transitions is implemented as the thresholding of the zero-crossing lines. The algorithm described has been applied to the public Saccharomyces cerevisiae dataset and it has been compared with two well-known algorithms: pseudo-median sliding window (PMSW) and the structural change model (SCM). As a proof-of-principle, we applied the ZCL algorithm to the analysis of the custom tiling microarray hybridization results of a S. aureus mutant deficient in the sigma B transcription factor. The challenge was to identify those transcripts whose expression decreases in the absence of sigma B. Conclusions: The proposed method archives the best performance in terms of positive predictive value (PPV) while its sensitivity is similar to the other algorithms used for the comparison. The computation time needed to process the transcriptional signals is low as compared with model-based methods and in the same range to those based on the use of filters. Automatic parameter selection has been incorporated and moreover, it can be easily adapted to a parallel implementation. We can conclude that the proposed method is well suited for the analysis of tiling signals, in which transcriptional activity is often hidden in the noise. Finally, the quantification and differential expression analysis of S. aureus dataset have demonstrated the valuable utility of this novel device to the biological analysis of the S. aureus transcriptome.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 73
    Publikationsdatum: 2012-09-05
    Beschreibung: Background: In todays short stay hospital settings the contact time for patients is reduced. However, itseems to be more important for the patients that the healthcare professionals are easy to get incontact with during the whole course of treatment, and to have the opportunity to exchangeinformation, as a basis for obtaining individualized information and support. Therefore, theaim was to explore the ability of a dialogue-based application to contribute to accessibility ofthe healthcare professionals and exchangeability of information.MethodAn application for online written and asynchronous contacts was developed, implemented inclinical practice, and evaluated. The qualitative effect of the online contact was exploredusing a Web-based survey comprised of open-ended questions. Results: Patients valued the online contacts and experienced feelings of partnership in dialogue, in aflexible and calm environment, which supported their ability to be active partners andfeelings of freedom and security. Conclusion: The online asynchronous written environment can contribute to accessibility andexchangeability, and add new possibilities for dialogues from which the patients can benefit.The individualized information obtained via online contact empowers the patients. TheInternet-based contacts are a way to differentiate and expand the possibilities for contactsoutside the few scheduled face-to-face hospital contacts.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 74
    Publikationsdatum: 2012-09-05
    Beschreibung: Background: Decisions concerning drug safety and efficacy are generally based on pivotal evidence provided by clinical trials. Unfortunately, finding the relevant clinical trials is difficult and their results are only available in text-based reports. Systematic reviews aim to provide a comprehensive overview of the evidence in a specific area, but may not provide the data required for decision making. Methods: We review and analyze the existing information systems and standards for aggregate level clinical trials information from the perspective of systematic review and evidence-based decision making. Results: The technology currently used has major shortcomings, which cause deficiencies in the transfer, traceability and availability of clinical trials information. Specifically, data available to decision makers is insufficiently structured, and consequently the decisions cannot be properly traced back to the underlying evidence. Regulatory submission, trial publication, trial registration, and systematic review produce unstructured datasets that are insufficient for supporting evidence-based decision making. Conclusions: The current situation is a hindrance to policy decision makers as it prevents fully transparent decision making and the development of more advanced decision support systems. Addressing the identified deficiencies would enable more efficient, informed, and transparent evidence-based medical decision making.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 75
    Publikationsdatum: 2012-08-29
    Beschreibung: Background: A number of software packages are available togenerate DNA multiple sequence alignments (MSAs) evolved undercontinuous-time Markov processes on phylogenetic trees. On the other hand,methods of simulating the DNA MSA directly from the transition matricesdo not exist. Moreover, existing software restricts tothe time-reversible models and it is not optimized togenerate nonhomogeneous data (i.e. placing distinct substitution rates at different lineages). Results: We present the first package designed to generate MSAs evolving under discrete-time Markov processes on phylogenetic trees, directly from probability substitution matrices.Based on the input model and aphylogenetic tree in the Newick format (with branch lengths measuredas the expected number of substitutions per site), thealgorithm produces DNA alignments of desired length.GenNon-h is publicly available for download. Conclusion: The software presented here is an efficient toolto generate DNA MSAs on a given phylogenetic tree.GenNon-h provides the user with the nonstationary or nonhomogeneousphylogenetic data that is well suited for testing complex biologicalhypotheses, exploring the limits of the reconstruction algorithms and their robustness to such models.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 76
    Publikationsdatum: 2012-08-30
    Beschreibung: Background: Since processes in well-known model organisms have specific features different from those in Bos taurus, the organism under study, a good way to describe gene regulation in ruminant embryos would be a species-specific consideration of closely related species to cattle, sheep and pig. However, as highlighted by a recent report, gene dictionaries in pig are smaller than in cattle, bringing a risk to reduce the gene resources to be mined (and so for sheep dictionaries). Bioinformatics approaches that allow an integration of available information on gene function in model organisms, taking into account their specificity, are thus needed. Besides these closely related and biologically relevant species, there is indeed much more knowledge of (i) trophoblast proliferation and differentiation or (ii) embryogenesis in human and mouse species, which provides opportunities for reconstructing proliferation and/or differentiation processes in other mammalian embryos, including ruminants. The necessary knowledge can be obtained partly from (i) stem cell or cancer research to supply useful information on molecular agents or molecular interactions at work in cell proliferation and (ii) mouse embryogenesis to supply useful information on embryo differentiation. However, the total number of publications for all these topics and species is great and their manual processing would be tedious and time consuming. This is why we used text mining for automated text analysis and automated knowledge extraction. To evaluate the quality of this "mining", we took advantage of studies that reported gene expression profiles during the elongation of bovine embryos and defined a list of transcription factors (or TF, n = 64) that we used as biological "gold standard". When successful, the "mining" approach would identify them all, as well as novel ones. Methods: To gain knowledge on molecular-genetic regulations in a non model organism, we offer an approach based on literature-mining and score arrangement of data from model organisms. This approach was applied to identify novel transcription factors during bovine blastocyst elongation, a process that is not observed in rodents and primates. As a result, searching through human and mouse corpuses, we identified numerous bovine homologs, among which 11 to 14% of transcription factors including the gold standard TF as well as novel TF potentially important to gene regulation in ruminant embryo development. The scripts of the workflow are written in perl and available on demand. They require data input coming from all various databases for any kind of biological issue once the data has been prepared according to keywords for the studied topic and species; we can provide data sample to illustrate the use and functionality of the workflow. Results: To do so, we created a workflow that allowed the pipeline processing of literature data and biological data, extracted from Web of science (WoS) or PubMed but also from Gene Expression Omnibus (GEO), Gene Ontology (GO), Uniprot, HomoloGene, TcoF-DB and TFe (TF encyclopedia). First, the human and mouse homologs of the bovine proteins were selected, filtered by text corpora and arranged by score functions. The score functions were based on the gene name frequencies in corpora. Then, transcription factors were identified using TcoF-DB and double-checked using TFe to characterise TF groups and families. Thus, among a search space of 18,670 Bovine homologs, 489 were identified as transcription factors. Among them, 243 were absent from the high-throughput data available at the time of the study. They thus stand so far for putative TF acting during Bovine embryo elongation, but might be retrieved from a recent RNA sequencing dataset (2012). Beyond the 246 TF that appeared expressed in bovine elongating tissues, we restricted our interpretation to those occurring within a list of 50 top-ranked genes. Among the transcription factors identified therein, half belonged to the gold standard (ASCL2, c-FOS, ETS2, GATA3, HAND1, TP53) and half did not (ESR1, HES1, ID2, NANOG, PHB2, TP53). Conclusions: A workflow providing search for transcription factors acting in bovine elongation was developed. The model assumed that proteins sharing the same protein domains in closely related species had the same protein functionalities, even if they were differently regulated among species or involved in somewhat different pathways. Under this assumption, we merged the information on different mammalian species from different databases (literature and biology) and proposed 489 TF as potential participants of embryo proliferation and differentiation, with (i) a recall of 95% with regard to a biological gold standard defined in 2011 and (ii) an extension of more than 3 times the gold standard of TF detected so far in elongating tissues. The working capacity of the workflow was supported by the manual expertise of the biologists on the results. The workflow can serve as a new kind of bioinformatics tool to work on fused data sources and can thus be useful in studies of a wide range of biological processes.
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 77
    Publikationsdatum: 2012-09-05
    Beschreibung: Background: In myocardial perfusion scintigraphy (MPS), typically a stress and a rest study is performed. If the stress study is considered normal, there is no need for a subsequent rest study. The aim of the study was to determine whether nuclear medicine technologists are able to assess the necessity of a rest study. Methods: Gated MPS using a 2-day 99mTc protocol for 121 consecutive patients were studied. Visual interpretation by 3 physicians was used as gold standard for determining the need for a rest study based on the stress images. All nuclear medicine technologists performing MPS had to review 82 training cases of stress MPS images with comments regarding the need for rest studies, and thereafter a test consisting of 20 stress MPS images. After passing this test, the nuclear medicine technologists in charge of a stress MPS study assessed whether a rest study was needed or not or if he/she was uncertain and wanted to consult a physician. After that, the physician in charge interpreted the images and decided whether a rest study was required or not. Results: The nuclear medicine technologists and the physicians in clinical routine agreed in 103 of the 107 cases (96%) for which the technologists felt certain regarding the need for a rest study. In the remaining 14 cases the technologists were uncertain, i.e. wanted to consult a physician. The agreement between the technologists and the physicians in clinical routine was very good, resulting in a kappa value of 0.92. There was no statistically significant difference in the evaluations made by technicians and physicians (P=0.617). Conclusions: The nuclear medicine technologists were able to accurately determine whether a rest study was necessary. There was very good agreement between nuclear medicine technologists and physicians in the assessment of the need for a rest study. If the technologists can make this decision, the effectiveness of the nuclear medicine department will improve.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 78
    Publikationsdatum: 2012-08-17
    Beschreibung: Background: Many biological processes are context-dependent or temporally specific. As a result, relationships between molecular constituents evolve across time and environments. While cutting-edge machine learning techniques can recover these networks, exploring and interpreting the rewiring behavior is challenging. Information visualization shines in this type of exploratory analysis, motivating the development of TVNViewer (http://sailing.cs.cmu.edu/tvnviewer), a visualization tool for dynamic network analysis. Results: In this paper, we demonstrate visualization techniques for dynamic network analysis by using TVNViewer to analyze yeast cell cycle and breast cancer progression datasets. Conclusions: TVNViewer is a powerful new visualization tool for the analysis of biological networks that change across time or space.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 79
    Publikationsdatum: 2012-08-17
    Beschreibung: Background: Numerous models for use in interpreting quantitative PCR (qPCR) data are present in recent literature. The most commonly used models assume the amplification in qPCR is exponential and fit an exponential model with a constant rate of increase to a select part of the curve. Kinetic theory may be used to model the annealing phase and does not assume constant efficiency of amplification. Mechanistic models describing the annealing phase with kinetic theory offer the most potential for accurate interpretation of qPCR data. Even so, they have not been thoroughly investigated and are rarely used for interpretation of qPCR data. New results for kinetic modeling of qPCR are presented. Results: Two models are presented in which the efficiency of amplification is based on equilibrium solutions for the annealing phase of the qPCR process. Model 1 assumes annealing of complementary targets strands and annealing of target and primers are both reversible reactions and reach a dynamic equilibrium. Model 2 assumes all annealing reactions are nonreversible and equilibrium is static. Both models include the effect of primer concentration during the annealing phase. Analytic formulae are given for the equilibrium values of all single and double stranded molecules at the end of the annealing step. The equilibrium values are then used in a stepwise method to describe the whole qPCR process. Rate constants of kinetic models are the same for solutions that are identical except for possibly having different initial target concentrations. Analysis of qPCR curves from such solutions are thus analyzed by simultaneous non-linear curve fitting with the same rate constant values applying to all curves and each curve having a unique value for initial target concentration. The models were fit to two data sets for which the true initial target concentrations are known. Both models give better fit to observed qPCR data than other kinetic models present in the literature. They also give better estimates of initial target concentration. Model 1 was found to be slightly more robust than model 2 giving better estimates of initial target concentration when estimation of parameters was done for qPCR curves with very different initial target concentration. Both models may be used to estimate the initial absolute concentration of target sequence when a standard curve is not available. Conclusions: It is argued that the kinetic approach to modeling and interpreting quantitative PCR data has the potential to give more precise estimates of the true initial target concentrations than other methods currently used for analysis of qPCR data. The two models presented here give a unified model of the qPCR process in that they explain the shape of the qPCR curve for a wide variety of initial target concentrations.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 80
    Publikationsdatum: 2012-08-17
    Beschreibung: Background: Variations in DNA copy number carry information on the modalities of genome evolution and mis-regulation of DNA replication in cancer cells. Their study can help localize tumor suppressor genes, distinguish different populations of cancerous cells, and identify genomic variations responsible for disease phenotypes. A number of different high throughput technologies can be used to identify copy number variable sites, and the literature documents multiple effective algorithms. We focus here on the specific problem of detecting regions where variation in copy number is relatively common in the sample at hand. This problem encompasses the cases of copy number polymorphisms, related samples, technical replicates, and cancerous sub-populations from the same individual. Results: We present a segmentation method named generalized fused lasso (GFL) to reconstruct copy number variant regions, that is based on penalized estimation and is capable of processing multiple signals jointly. Our approach is computationally very attractive and leads to sensitivity and specificity levels comparable to those of state-of-the-art specialized methodologies. We illustrate its applicability with simulated and real data sets. Conclusions: The flexibility of our framework makes it applicable to data obtained with a wide range of technology. Its versatility and speed make GFL particularly useful in the initial screening stages of large data sets.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 81
    facet.materialart.
    Unbekannt
    BioMed Central
    Publikationsdatum: 2012-07-17
    Beschreibung: Background: Although genome-scale expression experiments are performed routinely in biomedical research, methods ofanalysis remain simplistic and their interpretation challenging. The conventional approach is to compare theexpression of each gene, one at a time, between treatment groups. This implicitly treats the gene expressionlevels as independent, but they are in fact highly interdependent, and exploiting this enables substantialpower gains to be realized. Results: We assume that information on the dependence structure between the expression levels of a set of genes isavailable in the form of a Bayesian network (directed acyclic graph), derived from external resources. Weshow how to analyze gene expression data conditional on this network. Genes whose expression is directlyaffected by treatment may be identified using tests for the independence of each gene and treatment,conditional on the parents of the gene in the network. We apply this approach to two datasets: one from ahepatotoxicity study in rats using a PPAR pathway, and the other from a study of the effects of smoking onthe epithelial transcriptome, using a global transcription factor network. Conclusions: The proposed method is straightforward, simple to implement, gives rise to substantial power gains, andmay assist in relating the experimental results to the underlying biology
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 82
    Publikationsdatum: 2012-07-17
    Beschreibung: Background: Refugees experience multiple health and social needs. This requires an integrated approach to care in the countries of resettlement, including Canada. Perhaps, interactive eHealth tools could build bridges between medical and social care in a timely manner. The authors developed and piloted a multi-risk Computer-assisted Psychosocial Risk Assessment (CaPRA) tool for Afghan refugees visiting a community health center. The iPad based CaPRA survey was completed by the patients in their own language before seeing the medical practitioner. The computer then generated individualized feedback for the patient and provider with suggestions about available services. Methods: A pilot randomized trial was conducted with adult Afghan refugees who could read Dari/Farsi or English language. Consenting patients were randomly assigned to the CaPRA (intervention) or usual care (control) group. All patients completed a paper-pencil exit survey. The primary outcome was patient intention to see a psychosocial counselor. The secondary outcomes were patient acceptance of the tool and visit satisfaction. Results: Out of 199 approached patients, 64 were eligible and 50 consented and one withdrew (CaPRA = 25; usual care = 24). On average, participants were 37.6 years of age and had lived 3.4 years in Canada. Seventy-two percent of participants in CaPRA group had intention to visit a psychosocial counselor, compared to 46 % in usual care group [X2 (1)] =3.47, p = 0.06]. On a 5-point scale, CaPRA group participants agreed with the benefits of the tool (mean = 4) and were 'unsure' about possible barriers to interact with the clinicians (mean = 2.8) or to privacy of information (mean = 2.8) in CaPRA mediated visits. On a 5-point scale, the two groups were alike in patient satisfaction (mean = 4.3). Conclusion: The studied eHealth tool offers a promising model to integrate medical and social care to address the health and settlement needs of refugees. The tool's potential is discussed in relation to implications for healthcare practice. The study should be replicated with a larger sample to generalize the results while controlling for potential confounders.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 83
    Publikationsdatum: 2012-07-16
    Beschreibung: Background: Identifying variants associated with complex human traits in high-dimensional data is a central goal of genome-wide association studies. However, complicated etiologies such as gene-gene interactions are ignored by the univariate analysis usually applied in these studies. Random Forests (RF) are a popular data-mining technique that can accommodate a large number of predictor variables and allow for complex models with interactions. RF analysis produces measures of variable importance that can be used to rank the predictor variables. Thus, single nucleotide polymorphism (SNP) analysis using RFs is gaining popularity as a potential filter approach that considers interactions in high-dimensional data. However, the impact of data dimensionality on the power of RF to identify interactions has not been thoroughly explored. We investigate the ability of rankings from variable importance measures to detect gene-gene interaction effects and their potential effectiveness as filters compared to p-values from univariate logistic regression, particularly as the data becomes increasingly high-dimensional. Results: RF effectively identifies interactions in low dimensional data. As the total number of predictor variables increases, probability of detection declines more rapidly for interacting SNPs than for non-interacting SNPs, indicating that in high-dimensional data the RF variable importance measures are capturing marginal effects rather than capturing the effects of interactions. Conclusions: While RF remains a promising data-mining technique that extends univariate methods to condition on multiple variables simultaneously, RF variable importance measures fail to detect interaction effects in high-dimensional data in the absence of a strong marginal component, and therefore may not be useful as a filter technique that allows for interaction effects in genome-wide data.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 84
    Publikationsdatum: 2012-07-18
    Beschreibung: Background: Alpha-helical transmembrane channel and transporter proteins play vital roles in a diverse range of essential biological processes and are crucial in facilitating the passage of ions and molecules across the lipid bilayer. However, the experimental difficulties associated with obtaining high quality crystals has led to their significant under-representation in structural databases; therefore, computational methods that can identify structural features from sequence alone are of high importance. Results: We present a method capable of automatically identifying pore-lining regions in transmembrane proteins from sequence information alone, which can then be used to determine the pore stoichiometry. By labelling pore-lining residues in crystal structures using geometric criteria, we have trained a support vector machine classifier to predict the likelihood of a transmembrane helix being involved in pore formation. Results from testing this approach under stringent cross-validation indicate that prediction accuracy of 72% is possible, while a support vector regression model is able to predict the number of subunits participating in the pore with 62% accuracy. Conclusion: To our knowledge, this is the first tool capable of identifying such regions and we present the results of applying it to a data set of sequences with available crystal structures. Our method provides a way to characterise pores in transmembrane proteins and may provide valuable insight into routes of therapeutic intervention in a number of important diseases. This software is freely available as source code from:http://bioinfadmin.cs.ucl.ac.uk/downloads/memsat-svm/
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 85
    Publikationsdatum: 2012-07-19
    Beschreibung: Background: A UK Register of people with Multiple Sclerosis has been developed to address the need foran increased knowledge-base about MS. The Register is being populated via: a web-basedportal; NHS neurology clinical systems; and administrative data sources. The data are deidentifiedand linked at the individual level. At the outset, it was not known whether peoplewith MS would wish to participate in the UK MS Register by personally contributing theirdata to the Register via a web-based system. Therefore, the research aim of this work was tobuild an internet-mounted recruitment and consenting technology for people with MultipleSclerosis, and to assess its feasibility as a questionnaire delivery platform to contribute datato the UK MS Register, by determining whether the information provided could be used todescribe a cohort of people with MS. Methods: The web portal was developed using VB.net and JQuery with a Microsoft SQL 2008database. UK adults with MS can self-register and enter data about themselves by completingvalidated questionnaires. Descriptive statistics were used to characterise the respondents. Results: The web portal was launched in May 2011, and in first three months 7,279 individualsregistered on the portal. The ratio of men to women was 1:2.4 (n = 5,899), the mean selfreportedage at first symptoms was 33.8 (SD 10.5) years, and at diagnosis 39.6 (SD 10.3)years (n = 4,401). The reported types of MS were: 15% primary progressive, 63% relapsingremitting,8% secondary progressive, and 14% unknown (n = 5,400). These characteristics aresimilar to those of the prevalent MS population. Employment rates, sickness/disability rates,ethnicity and educational qualifications were compared with the general UK population.Information about the respondents' experience of early symptoms and the process ofdiagnosis, plus living arrangements are also reported. Conclusions: These initial findings from the MS Register portal demonstrate the feasibility of collectingdata about people with MS via a web platform, and show that sufficient information can begathered to characterise a cohort of people with MS. The innovative design of the UK MSregister, bringing together three disparate sources of data, is creating a rich resource forresearch into this condition.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 86
    Publikationsdatum: 2012-07-19
    Beschreibung: Background: The k-mer hash length is a key factor affecting the output of de novo transcriptome assembly packages using de Bruijn graph algorithms. Assemblies constructed with varying single k-mer choices might result in the loss of unique contiguous sequences (contigs) and relevant biological information. A common solution to this problem is the clustering of single k-mer assemblies. Even though annotation is one of the primary goals of a transcriptome assembly, the success of assembly strategies does not consider the impact of k-mer selection on the annotation output. This study provides an in-depth k-mer selection analysis that is focused on the degree of functional annotation achieved for a non-model organism where no reference genome information is available. Individual k-mers and clustered assemblies (CA) were considered using three representative software packages. Pair-wise comparison analyses (between individual k-mers and CAs) were produced to reveal missing Kyoto Encyclopedia of Genes and Genomes (KEGG) ortholog identifiers (KOIs), and to determine a strategy that maximizes the recovery of biological information in a de novo transcriptome assembly. Results: Analyses of single k-mer assemblies resulted in the generation of various quantities of contigs and functional annotations within the selection window of k-mers (k-19 to k-63). For each k-mer in this window, generated assemblies contained certain unique contigs and KOIs that were not present in the other k-mer assemblies. Producing a non-redundant CA of k-mers 19 to 63 resulted in a more complete functional annotation than any single k-mer assembly. However, a fraction of unique annotations remained (~0.19 to 0.27% of total KOIs) in the assemblies of individual k-mers (k-19 to k-63) that were not present in the non-redundant CA. A workflow to recover these unique annotations is presented. Conclusions: This study demonstrated that different k-mer choices result in various quantities of unique contigs per single k-mer assembly which affects biological information that is retrievable from the transcriptome. This undesirable effect can be minimized, but not eliminated, with clustering of multi-k assemblies with redundancy removal. The complete extraction of biological information in de novo transcriptomics studies requires both the production of a CA and efforts to identify unique contigs that are present in individual k-mer assemblies but not in the CA.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 87
    Publikationsdatum: 2012-07-19
    Beschreibung: Background: Online psychiatric texts are natural language texts expressing depressive problems, publishedby Internet users via community-based web services such as web forums, message boards andblogs. Understanding the cause-effect relations embedded in these psychiatric texts canprovide insight into the authors' problems, thus increasing the effectiveness of onlinepsychiatric services. Methods: Previous studies have proposed the use of word pairs extracted from a set of sentence pairs toidentify cause-effect relations between sentences. A word pair is made up of two words, withone coming from the cause text span and the other from the effect text span. Analysis of therelationship between these words can be used to capture individual word associationsbetween cause and effect sentences. For instance, (broke up, life) and (boyfriend,meaningless) are two word pairs extracted from the sentence pair: "I broke up with myboyfriend. Life is now meaningless to me". The major limitation of word pairs is thatindividual words in sentences usually cannot reflect the exact meaning of the cause and effectevents, and thus may produce semantically incomplete word pairs, as the previous examplesshow. Therefore, this study proposes the use of inter-sentential language patterns such as 〈 , 〉 to detect causality between sentences. Theinter-sentential language patterns can capture associations among multiple words within andbetween sentences, thus can provide more precise information than word pairs. To acquireinter-sentential language patterns, we develop a text mining framework by extending theclassical association rule mining algorithm such that it can discover frequently co-occurringpatterns across the sentence boundary. Results: Performance was evaluated on a corpus of texts collected from PsychPark(http://www.psychpark.org), a virtual psychiatric clinic maintained by a group of volunteerprofessionals from the Taiwan Association of Mental Health Informatics. Experimentalresults show that the use of inter-sentential language patterns outperformed the use of wordpairs proposed in previous studies. Conclusions: This study demonstrates the acquisition of inter-sentential language patterns for causalitydetection from online psychiatric texts. Such semantically more complete and precise featurescan improve causality detection performance.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 88
    Publikationsdatum: 2012-07-24
    Beschreibung: Background: Today, recognition and classification of sequence motifs and protein folds is a mature field, thanks to the availability of numerous comprehensive and easy to use software packages and web-based services. Recognition of structural motifs, by comparison, is less well developed and much less frequently used, possibly due to a lack of easily accessible and easy to use software. Results: In this paper, we describe an extension of DeepView/Swiss-PdbViewer through which structural motifs may be defined and searched for in large protein structure databases, and we show that common structural motifs involved in stabilizing protein folds are present in evolutionarily and structurally unrelated proteins, also in deeply buried locations which are not obviously related to protein function. Conclusions: The possibility to define custom motifs and search for their occurrence in other proteins permits the identification of recurrent arrangements of residues that could have structural implications. The possibility to do so without having to maintain a complex software / hardware installation on site brings this technology to experts and non-experts alike.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 89
    Publikationsdatum: 2012-07-24
    Beschreibung: Background: Clustering DNA sequences into functional groups is an important problem in bioinformatics. We propose a new alignment-free algorithm, mBKM, based on a new distance measure, DMk, for clustering gene sequences. This method transforms DNA sequences into the feature vectors which contain the occurrence, location and order relation of k-tuples in DNA sequence. Afterwards, a hierarchical procedure is applied to clustering DNA sequences based on the feature vectors. Results: The proposed distance measure and clustering method are evaluated by clustering functionally related genes and by phylogenetic analysis. This method is also compared with BlastClust and CD-HIT-EST. The experimental results show our method is effective in classifying DNA sequences with similar biological characteristics and in discovering the underlying relationship among the sequences. Conclusions: We introduced a novel clustering algorithm which is based on a new sequence similarity measure. It is effective in classifying DNA sequences with similar biological characteristics and in discovering the relationship among the sequences.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 90
    Publikationsdatum: 2012-07-24
    Beschreibung: Background: Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway - metabolic pathways - has been largely neglected.Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein-protein interactions. Results: When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task. Conclusions: We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein-protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 91
    Publikationsdatum: 2012-06-16
    Beschreibung: Background: The distance matrix computed from multiple alignments of homologous sequences is widely used by distance-based phylogenetic methods to provide information on the evolution of protein families. This matrix can also be visualized in a low dimensional space by metric multidimensional scaling (MDS). Applied to protein families, MDS provides information complementary to the information derived from tree-based methods. Moreover, MDS gives a unique opportunity to compare orthologous sequence sets because it can add supplementary elements to a reference space. Results: The R package bios2mds (from BIOlogical Sequences to MultiDimensional Scaling) has been designed to analyze multiple sequence alignments by MDS. Bios2mds starts with a sequence alignment, builds a matrix of distances between the aligned sequences, and represents this matrix by MDS to visualize a sequence space. This package also offers the possibility of performing K-means clustering in the MDS derived sequence space. Most importantly, bios2mds includes a function that projects supplementary elements (a.k.a. "out of sample" elements) onto the space defined by reference or "active" elements. Orthologous sequence sets can thus be compared in a straightforward way. The data analysis and visualization tools have been specifically designed for an easy monitoring of the evolutionary drift of protein sub-families. Conclusions: The bios2mds package provides the tools for a complete integrated pipeline aimed at the MDS analysis of multiple sets of orthologous sequences in the R statistical environment. In addition, as the analysis can be carried out from user provided matrices, the projection function can be widely used on any kind of data.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 92
    Publikationsdatum: 2012-06-20
    Beschreibung: Background: Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) is commonly used toidentify differentially expressed proteins under two or more experimental or observationalconditions. Wu et al (2009) developed a univariate probabilistic model which was used toidentify differential expression between Case and Control groups, by applying a LikelihoodRatio Test (LRT) to each protein on a 2D PAGE. In contrast to commonly used statisticalapproaches, this model takes into account the two possible causes of missing values in 2DPAGE: either (1) the non-expression of a protein; or (2) a level of expression that falls belowthe limit of detection. Results: We develop a global Bayesian model which extends the previously described model. Unlikethe univariate approach, the model reported here is able treat all differentially expressedproteins simultaneously. Whereas each protein is modelled by the univariate likelihoodfunction previously described, several global distributions are used to model the underlyingrelationship between the parameters associated with individual proteins. These globaldistributions are able to combine information from each protein to give more accurateestimates of the true parameters. In our implementation of the procedure, all parameters arerecovered by Markov chain Monte Carlo (MCMC) integration. The 95% highest posteriordensity (HPD) intervals for the marginal posterior distributions are used to determine whetherdifferences in protein expression are due to differences in mean expression intensities, and/ordifferences in the probabilities of expression. Conclusions: Simulation analyses showed that the global model is able to accurately recover the underlyingglobal distributions, and identify more differentially expressed proteins than the simpleapplication of a LRT. Additionally, simulations also indicate that the probability ofincorrectly identifying a protein as differentially expressed (i.e., the False Discovery Rate) isvery low. The source code is available at https://github.com/stevenhwu/BIDE-2D.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 93
    Publikationsdatum: 2012-06-20
    Beschreibung: Background: The identification of gene sets that are significantly impacted in a given condition based on microarray data is acrucial step in current life science research. Most gene set analysis methods treat genes equally, regardless howspecific they are to a given gene set. Results: In this work we propose a new gene set analysis method that computes a gene set score as the mean of absolutevalues of weighted moderated gene t-scores. The gene weights are designed to emphasize the genes appearing infew gene sets, versus genes that appear in many gene sets. We demonstrate the usefulness of the method whenanalyzing gene sets that correspond to the KEGG pathways, and hence we called our method Pathway Analysiswith Down-weighting of Overlapping Genes (PADOG). Unlike most gene set analysis methods which arevalidated through the analysis of 2-3 data sets followed by a human interpretation of the results, the validationemployed here uses 24 different data sets and a completely objective assessment scheme that makes minimalassumptions and eliminates the need for possibly biased human assessments of the analysis results. Conclusions: PADOG significantly improves gene set ranking and boosts sensitivity of analysis using information alreadyavailable in the gene expression profiles and the collection of gene sets to be analyzed. The advantages ofPADOG over other existing approaches are shown to be stable to changes in the database of gene sets to beanalyzed. PADOG was implemented as an R package available at:http://bioinformaticsprb.med.wayne.edu/PADOG/ or www.bioconductor.org.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 94
    Publikationsdatum: 2012-06-20
    Beschreibung: Background: Microarray data enables the high-throughput survey of mRNA expression profiles at the genomic level;however, the data presents a challenging statistical problem because of the large number of transcripts withsmall sample sizes that are obtained. To reduce the dimensionality, various Bayesian or empirical Bayeshierarchical models have been developed. However, because of the complexity of the microarray data, nomodel can explain the data fully. It is generally difficult to scrutinize the irregular patterns of expression thatare not expected by the usual statistical gene by gene models. Results: As an extension of empirical Bayes (EB) procedures, we have developed the beta-empirical Bayes (beta-EB)approach based on a beta-likelihood measure which can be regarded as an 'evidence-based' weighted (quasi-)likelihood inference. The weight of a transcript t is described as a power function of its likelihood, f beta(yt|theta).Genes with low likelihoods have unexpected expression patterns and low weights. By assigning low weightsto outliers, the inference becomes robust. The value of beta, which controls the balance between the robustnessand efficiency, is selected by maximizing the predictive beta0-likelihood by cross-validation. The proposedbeta-EB approach identified six significant (p 〈 105) contaminated transcripts as differentially expressed(DE) in normal/tumor tissues from the head and neck of cancer patients. These six genes were all confirmedto be related to cancer; they were not identified as DE genes by the classical EB approach. When applied tothe eQTL analysis of Arabidopsis thaliana, the proposed beta-EB approach identified some potential masterregulators that were missed by the EB approach. Conclusions: The simulation data and real gene expression data showed that the proposed beta-EB method was robustagainst outliers. The distribution of the weights was used to scrutinize the irregular patterns of expression and diagnose the model statistically. When beta-weights outside the range of the predicted distribution wereobserved, a detailed inspection of the data was carried out. The beta-weights described here can be applied toother likelihood-based statistical models for diagnosis, and may serve as a useful tool for transcriptome andproteome studies.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 95
    Publikationsdatum: 2012-06-21
    Beschreibung: Background: Artificial neural networks (ANNs) are widely studied for evaluating diseases. This paper discusses the intelligence mode of an ANN in grading the diagnosis of liver fibrosis by duplex ultrasonogaphy. Methods: 239 patients who were confirmed as having liver fibrosis or cirrhosis by ultrasound guided liver biopsy were investigated in this study. We quantified ultrasonographic parameters as significant parameters using a data optimization procedure applied to an ANN. 179 patients were typed at random as the training group; 60 additional patients were consequently enrolled as the validating group. Performance of the ANN was evaluated according to accuracy, sensitivity, specificity, Youden's index and receiver operating characteristic (ROC) analysis. Results: 5 ultrasonographic parameters; i.e., the liver parenchyma, thickness of spleen, hepatic vein (HV) waveform, hepatic artery pulsatile index (HAPI) and HV damping index (HVDI), were enrolled as the input neurons in the ANN model. The sensitivity, specificity and accuracy of the ANN model for quantitative diagnosis of liver fibrosis were 95.0%, 85.0% and 88.3%, respectively. The Youden's index (YI) was 0.80. Conclusions: The established ANN model had good sensitivity and specificity in quantitative diagnosis of hepatic fibrosis or liver cirrhosis. Our study suggests that the ANN model based on duplex ultrasound may help non-invasive grading diagnosis of liver fibrosis in clinical practice.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 96
    Publikationsdatum: 2012-07-17
    Beschreibung: Background: The emergence of Next Generation Sequencing technologies has made it possible for individual investigators to generate gigabases of sequencing data per week. Effective analysis and manipulation of these data is limited due to large file sizes, so even simple tasks such as data filtration and quality assessment have to be performed in several steps. This requires (potentially problematic) interaction between the investigator and a bioinformatics/computational service provider. Furthermore, such services are often performed using specialized computational facilities. Results: We present a windows-based application, Slim-Filter designed to interactively examine the statistical properties of sequencing reads produced by Illumina Genome Analyzer and to perform a broad spectrum of data manipulation tasks including: filtration of low quality and low complexity reads; filtration of reads containing undesired subsequences (such as parts of adapters and PCR primers used during the sample and sequencing libraries preparation steps); excluding duplicated reads (while keeping each read's copy number information in a specialized data format); and sorting reads by copy numbers allowing for easy access and manual editing of the resulting files. Slim-Filter is organized as a sequence of windows summarizing the statistical properties of the reads. Each data manipulation step has roll-back abilities, allowing for return to previous steps of the data analysis process. Slim-Filter is written in C++ and is compatible with fasta, fastq, and specialized AS file formats presented in this manuscript. Setup files and a user's manual are available for download at the supplementary web site (https://www.bioinfo.uh.edu/Slim_Filter/). Conclusion: The presented windows-based application has been developed with the goal to provide individual investigators with integrated sequencing reads analysis, curation, and manipulation capabilities.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 97
    Publikationsdatum: 2012-07-17
    Beschreibung: Background: In high throughput cancer genomic studies, results from the analysis of single datasets often suffer from a lack of reproducibility because of small sample sizes. Integrative analysis can effectively pool and analyze multiple datasets and provides a cost effective way to improve reproducibility. In integrative analysis, simultaneously analyzing all genes profiled may incur high computational cost. A computationally affordable remedy is prescreening, which fits marginal models, can be conducted in a parallel manner, and has low computational cost. Results: An integrative prescreening approach is developed for the analysis of multiple cancer genomic datasets. Simulation shows that the proposed integrative prescreening has better performance than alternatives, particularly including prescreening with individual datasets, an intensity approach and meta-analysis. We also analyze multiple microarray gene profiling studies on liver and pancreatic cancers using the proposed approach. Conclusions: The proposed integrative prescreening provides an effective way to reduce the dimensionality in cancer genomic studies. It can be coupled with existing analysis methods to identify cancer markers.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 98
    Publikationsdatum: 2012-07-17
    Beschreibung: Background: The AthaMap database generates a genome-wide map for putative transcription factor binding sites for A. thaliana. When analyzing transcriptional regulation using AthaMap it may be important to learn which genes are also post-transcriptionally regulated by inhibitory RNAs. Therefore, a unified database for transcriptional and post-transcriptional regulation will be highly useful for the analysis of gene expression regulation. Methods: To identify putative microRNA target sites in the genome of A. thaliana, processed mature miRNAs from 243 annotated miRNA genes were used for screening with the psRNATarget web server. Positional information, target genes and the psRNATarget score for each target site were annotated to the AthaMap database. Furthermore, putative target sites for small RNAs from seven small RNA transcriptome datasets were used to determine small RNA target sites within the A. thaliana genome. Results: Putative 41,965 genome wide miRNA target sites and 10,442 miRNA target genes were identified in the A. thaliana genome. Taken together with genes targeted by small RNAs from small RNA transcriptome datasets, a total of 16,600 A. thaliana genes are putatively regulated by inhibitory RNAs. A novel web-tool, 'MicroRNA Targets', was integrated into AthaMap which permits the identification of genes predicted to be regulated by selected miRNAs. The predicted target genes are displayed with positional information and the psRNATarget score of the target site. Furthermore, putative target sites of small RNAs from selected tissue datasets can be identified with the new 'Small RNA Targets' web-tool. Conclusions: The integration of predicted miRNA and small RNA target sites with transcription factor binding sites will be useful for AthaMap-assisted gene expression analysis. URL: http://www.athamap.de/
    Digitale ISSN: 1756-0381
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 99
    Publikationsdatum: 2012-07-17
    Beschreibung: Background: Precise DNA-protein interactions play most important and vital role in maintaining the normal physiological functioning of the cell, as it controls many high fidelity cellular processes. Detailed study of the nature of these interactions has paved the way for understanding the mechanisms behind the biological processes in which they are involved. Earlier in 2000, a systematic classification of DNA-protein complexes based on the structural analysis of the proteins was proposed at two tiers, namely groups and families. With the advancement in the number and resolution of structures of DNA-protein complexes deposited in the Protein Data Bank, it is important to revisit the existing classification. Results: On the basis of the sequence analysis of DNA binding proteins, we have built upon the protein centric, two-tier classification of DNA-protein complexes by adding new members to existing families and making new families and groups. While classifying the new complexes, we also realised the emergence of new groups and families. The new group observed was where beta-propeller was seen to interact with DNA. There were 34 SCOP folds which were observed to be present in the complexes of both old and new classifications, whereas 28 folds are present exclusively in the new complexes. Some new families noticed were NarL transcription factor, Z-alpha DNA binding proteins, Forkhead transcription factor, AP2 protein, Methyl CpG binding protein etc. Conclusions: Our results suggest that with the increasing number of availability of DNA-protein complexes in Protein Data Bank, the number of families in the classification increased by approximately three fold. The folds present exclusively in newly classified complexes is suggestive of inclusion of proteins with new function in new classification, the most populated of which are the folds responsible for DNA damage repair. The proposed re-visited classification can be used to perform genome-wide surveys in the genomes of interest for the presence of DNA-binding proteins. Further analysis of these complexes can aid in developing algorithms for identifying DNA-binding proteins and their family members from mere sequence information.
    Digitale ISSN: 1471-2105
    Thema: Biologie , Informatik
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 100
    Publikationsdatum: 2012-07-21
    Beschreibung: Background: The treatment of cancer associated thrombosis (CAT) is well established, with level 1A evidence to support the recommendation of a low molecular weight heparin (LMWH) by daily injection for 3-6 months. However, registry data suggest compliance to clinical guidelines is poor. Clinicians face particular challenges in treating CAT in advanced cancer patients due to shorter life expectancy, increased bleeding risk and concerns that self injection may be too burdensome. For these reasons decision making around the diagnosis and management of CAT in people with advanced cancer, can be complex, and should focus on its likely net benefit for the patient. We explored factors that influence doctors' decision making in this situation and sought to gain an understanding of the barriers and facilitators to the application of best practice. Methods: Think aloud exercises using standardised case scenarios, and individual in depth interviews were conducted. All were transcribed. The think aloud exercises were analysed using Protocol Analysis and the interviews using Framework Analysis.Participants: 46 participants took part in the think aloud exercises and 45 participants were interviewed in depth. Each group included oncologists, palliative physicians and general practitioners and included both senior doctors and those in training.Setting: Two Strategic Health Authority regions, one in the north of England and one in Wales. Results: The following key issues arose from the data synthesis: the importance of patient prognosis; the concept of "appropriateness"; "benefits and burdens" of diagnosis and treatment; LMWH or warfarin for treatment and sources of information which changed practice. Although interlinked, they do describe distinct aspects of the factors that influence doctors in their decisions in this area. Conclusions: The above factors are issues doctors take into account when deciding whether to send a patient to hospital for investigation or to anticoagulate a patient with confirmed or suspected VTE. Many factors interweave and are themselves influenced by and dependent on each other. It is only after all are taken into account that the doctor arrives at the point of referring the patient for investigation. Some factors including logistic and organisational issues appeared to influence whether a patient would be investigated or treated with LMWH for a confirmed VTE. It is important that services are optimised to ensure that these do not hinder the appropriate investigation and management of individual patients.
    Digitale ISSN: 1472-6947
    Thema: Informatik , Medizin
    Publiziert von BioMed Central
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...