ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Articles  (16,714)
  • BMC Bioinformatics  (2,126)
  • IEEE Transactions on Pattern Analysis and Machine Intelligence  (1,266)
  • BMC Medical Informatics and Decision Making  (757)
  • 1275
  • 9756
  • 9794
  • Computer Science  (16,714)
Collection
  • Articles  (16,714)
Years
Topic
  • 1
    Publication Date: 2021-08-21
    Description: Background Significant investments have been made towards the implementation of mHealth applications and eRecord systems globally. However, fragmentation of these technologies remains a big challenge, often unresolved in developing countries. In particular, evidence shows little consideration for linking mHealth applications and eRecord systems. Botswana is a typical developing country in sub-Saharan Africa that has explored mHealth applications, but the solutions are not interoperable with existing eRecord systems. This paper describes Botswana’s eRecord systems interoperability landscape and provides guidance for linking mHealth applications to eRecord systems, both for Botswana and for developing countries using Botswana as an exemplar. Methods A survey and interviews of health ICT workers and a review of the Botswana National eHealth Strategy were completed. Perceived interoperability benefits, opportunities and challenges were charted and analysed, and future guidance derived. Results Survey and interview responses showed the need for interoperable mHealth applications and eRecord systems within the health sector of Botswana and within the context of the National eHealth Strategy. However, the current Strategy does not address linking mHealth applications to eRecord systems. Across Botswana’s health sectors, global interoperability standards and Application Programming Interfaces are widely used, with some level of interoperability within, but not between, public and private facilities. Further, a mix of open source and commercial eRecord systems utilising relational database systems and similar data formats are supported. Challenges for linking mHealth applications and eRecord systems in Botswana were identified and categorised into themes which led to development of guidance to enhance the National eHealth Strategy. Conclusion Interoperability between mHealth applications and eRecord systems is needed and is feasible. Opportunities and challenges for linking mHealth applications to eRecord systems were identified, and future guidance stemming from this insight presented. Findings will aid Botswana, and other developing countries, in resolving the pervasive disconnect between mHealth applications and eRecord systems.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2021-08-21
    Description: Background In most flowering plants, the plastid genome exhibits a quadripartite genome structure, comprising a large and a small single copy as well as two inverted repeat regions. Thousands of plastid genomes have been sequenced and submitted to public sequence repositories in recent years. The quality of sequence annotations in many of these submissions is known to be problematic, especially regarding annotations that specify the length and location of the inverted repeats: such annotations are either missing or portray the length or location of the repeats incorrectly. However, many biological investigations employ publicly available plastid genomes at face value and implicitly assume the correctness of their sequence annotations. Results We introduce , a Python package that automatically assesses the frequency of incomplete or incorrect annotations of the inverted repeats among publicly available plastid genomes. Specifically, the tool automatically retrieves plastid genomes from NCBI Nucleotide under variable search parameters, surveys them for length and location specifications of inverted repeats, and confirms any inverted repeat annotations through self-comparisons of the genome sequences. The package also includes functionality for automatic identification and removal of duplicate genome records and accounts for taxa that genuinely lack inverted repeats. A survey of the presence of inverted repeat annotations among all plastid genomes of flowering plants submitted to NCBI Nucleotide until the end of 2020 using , followed by a statistical analysis of potential associations with record metadata, highlights that release year and publication status of the genome records have a significant effect on the frequency of complete and equal-length inverted repeat annotations. Conclusion The number of plastid genomes on NCBI Nucleotide has increased dramatically in recent years, and many more genomes will likely be submitted over the next decade. enables researchers to automatically access and evaluate the inverted repeats of these plastid genomes as well as their sequence annotations and, thus, contributes to increasing the reliability of publicly available plastid genomes. The software is freely available via the Python package index at http://pypi.python.org/pypi/airpg.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2021-08-21
    Description: Background Drug–drug interactions (DDIs) refer to processes triggered by the administration of two or more drugs leading to side effects beyond those observed when drugs are administered by themselves. Due to the massive number of possible drug pairs, it is nearly impossible to experimentally test all combinations and discover previously unobserved side effects. Therefore, machine learning based methods are being used to address this issue. Methods We propose a Siamese self-attention multi-modal neural network for DDI prediction that integrates multiple drug similarity measures that have been derived from a comparison of drug characteristics including drug targets, pathways and gene expression profiles. Results Our proposed DDI prediction model provides multiple advantages: (1) It is trained end-to-end, overcoming limitations of models composed of multiple separate steps, (2) it offers model explainability via an Attention mechanism for identifying salient input features and (3) it achieves similar or better prediction performance (AUPR scores ranging from 0.77 to 0.92) compared to state-of-the-art DDI models when tested on various benchmark datasets. Novel DDI predictions are further validated using independent data resources. Conclusions We find that a Siamese multi-modal neural network is able to accurately predict DDIs and that an Attention mechanism, typically used in the Natural Language Processing domain, can be beneficially applied to aid in DDI model explainability.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2021-08-21
    Description: Background To enhance teleconsultation management, demands can be classified into different patterns, and the service of each pattern demand can be improved. Methods For the effective teleconsultation classification, a novel ensemble hierarchical clustering method is proposed in this study. In the proposed method, individual clustering results are first obtained by different hierarchical clustering methods, and then ensembled by one-hot encoding, the calculation and division of cosine similarity, and network graph representation. In the built network graph about the high cosine similarity, the connected demand series can be categorized into one pattern. For verification, 43 teleconsultation demand series are used as sample data, and the efficiency and quality of teleconsultation services are respectively analyzed before and after the demand classification. Results The teleconsultation demands are classified into three categories, erratic, lumpy, and slow. Under the fixed strategies, the service analysis after demand classification reveals the deficiencies of teleconsultation services, but analysis before demand classification can’t. Conclusion The proposed ensemble hierarchical clustering method can effectively category teleconsultation demands, and the effective demand categorization can enhance teleconsultation management.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2021-02-25
    Description: Background Matrix factorization methods are linear models, with limited capability to model complex relations. In our work, we use tropical semiring to introduce non-linearity into matrix factorization models. We propose a method called Sparse Tropical Matrix Factorization () for the estimation of missing (unknown) values in sparse data. Results We evaluate the efficiency of the method on both synthetic data and biological data in the form of gene expression measurements downloaded from The Cancer Genome Atlas (TCGA) database. Tests on unique synthetic data showed that approximation achieves a higher correlation than non-negative matrix factorization (), which is unable to recover patterns effectively. On real data, outperforms on six out of nine gene expression datasets. While assumes normal distribution and tends toward the mean value, can better fit to extreme values and distributions. Conclusion is the first work that uses tropical semiring on sparse data. We show that in certain cases semirings are useful because they consider the structure, which is different and simpler to understand than it is with standard linear algebra.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2021-02-01
    Description: Background Due to the complexity of the biological systems, the prediction of the potential DNA binding sites for transcription factors remains a difficult problem in computational biology. Genomic DNA sequences and experimental results from parallel sequencing provide available information about the affinity and accessibility of genome and are commonly used features in binding sites prediction. The attention mechanism in deep learning has shown its capability to learn long-range dependencies from sequential data, such as sentences and voices. Until now, no study has applied this approach in binding site inference from massively parallel sequencing data. The successful applications of attention mechanism in similar input contexts motivate us to build and test new methods that can accurately determine the binding sites of transcription factors. Results In this study, we propose a novel tool (named DeepGRN) for transcription factors binding site prediction based on the combination of two components: single attention module and pairwise attention module. The performance of our methods is evaluated on the ENCODE-DREAM in vivo Transcription Factor Binding Site Prediction Challenge datasets. The results show that DeepGRN achieves higher unified scores in 6 of 13 targets than any of the top four methods in the DREAM challenge. We also demonstrate that the attention weights learned by the model are correlated with potential informative inputs, such as DNase-Seq coverage and motifs, which provide possible explanations for the predictive improvements in DeepGRN. Conclusions DeepGRN can automatically and effectively predict transcription factor binding sites from DNA sequences and DNase-Seq coverage. Furthermore, the visualization techniques we developed for the attention modules help to interpret how critical patterns from different types of input features are recognized by our model.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2021-02-01
    Description: Background IsomiRs are miRNA variants that vary in length and/or sequence when compared to their canonical forms. These variants display differences in length and/or sequence, including additions or deletions of one or more nucleotides (nts) at the 5′ and/or 3′ end, internal editings or untemplated 3′ end additions. Most available tools for small RNA-seq data analysis do not allow the identification of isomiRs and often require advanced knowledge of bioinformatics. To overcome this, we have developed IsomiR Window, a platform that supports the systematic identification, quantification and functional exploration of isomiR expression in small RNA-seq datasets, accessible to users with no computational skills. Methods IsomiR Window enables the discovery of isomiRs and identification of all annotated non-coding RNAs in RNA-seq datasets from animals and plants. It comprises two main components: the IsomiR Window pipeline for data processing; and the IsomiR Window Browser interface. It integrates over ten third-party softwares for the analysis of small-RNA-seq data and holds a new algorithm that allows the detection of all possible types of isomiRs. These include 3′ and 5′end isomiRs, 3′ end tailings, isomiRs with single nucleotide polymorphisms (SNPs) or potential RNA editings, as well as all possible fuzzy combinations. IsomiR Window includes all required databases for analysis and annotation, and is freely distributed as a Linux virtual machine, including all required software. Results IsomiR Window processes several datasets in an automated manner, without restrictions of input file size. It generates high quality interactive figures and tables which can be exported into different formats. The performance of isomiR detection and quantification was assessed using simulated small-RNA-seq data. For correctly mapped reads, it identified different types of isomiRs with high confidence and 100% accuracy. The analysis of a small RNA-seq data from Basal Cell Carcinomas (BCCs) using isomiR Window confirmed that miR-183-5p is up-regulated in Nodular BCCs, but revealed that this effect was predominantly due to a novel 5′end variant. This variant displays a different seed region motif and 1756 isoform-exclusive mRNA targets that are significantly associated with disease pathways, underscoring the biological relevance of isomiR-focused analysis. IsomiR Window is available at https://isomir.fc.ul.pt/.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2021-02-01
    Description: Background and objectives Internet-based technologies play an increasingly important role in the management and outcome of patients with chronic kidney disease (CKD). The healthcare system is currently flooded with digital innovations and internet-based technologies as a consequence of the coronavirus disease 2019 (COVID-19) pandemic. However, information about the attitude of German CKD-patients with access to online tools towards the use of remote, internet-based interactions such as video conferencing, email, electronic medical records and apps in general and for health issues in particular, are missing. Design, setting, participants, and measurements To address the use, habits and willingness of CKD patients in handling internet-based technologies we conducted a nationwide cross-sectional questionnaire survey in adults with CKD. Results We used 380 questionnaires from adult CKD patients (47.6% on dialysis, 43.7% transplanted and 8.7% CKD before renal replacement therapy) for analysis. Of these 18.9% denied using the internet at all (nonusers). Nonusers were significantly older (74.4 years, SD 11.4) than users (54.5 years, SD 14.5, p 
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2021-03-29
    Description: Background Inguinal hernia repair, gallbladder removal, and knee- and hip replacements are the most commonly performed surgical procedures, but all are subject to practice variation and variable patient-reported outcomes. Shared decision-making (SDM) has the potential to reduce surgery rates and increase patient satisfaction. This study aims to evaluate the effectiveness of an SDM strategy with online decision aids for surgical and orthopaedic practice in terms of impact on surgery rates, patient-reported outcomes, and cost-effectiveness. Methods The E-valuAID-study is designed as a multicentre, non-randomized stepped-wedge study in patients with an inguinal hernia, gallstones, knee or hip osteoarthritis in six surgical and six orthopaedic departments. The primary outcome is the surgery rate before and after implementation of the SDM strategy. Secondary outcomes are patient-reported outcomes and cost-effectiveness. Patients in the usual care cluster prior to implementation of the SDM strategy will be treated in accordance with the best available clinical evidence, physician’s knowledge and preference and the patient’s preference. The intervention consists of the implementation of the SDM strategy and provision of disease-specific online decision aids. Decision aids will be provided to the patients before the consultation in which treatment decision is made. During this consultation, treatment preferences are discussed, and the final treatment decision is confirmed. Surgery rates will be extracted from hospital files. Secondary outcomes will be evaluated using questionnaires, at baseline, 3 and 6 months. Discussion The E-valuAID-study will examine the cost-effectiveness of an SDM strategy with online decision aids in patients with an inguinal hernia, gallstones, knee or hip osteoarthritis. This study will show whether decision aids reduce operation rates while improving patient-reported outcomes. We hypothesize that the SDM strategy will lead to lower surgery rates, better patient-reported outcomes, and be cost-effective. Trial registration: The Netherlands Trial Register, Trial NL8318, registered 22 January 2020. URL: https://www.trialregister.nl/trial/8318.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2021-02-01
    Description: Background This study developed a diagnostic tool to automatically detect normal, unclear and tumor images from colonoscopy videos using artificial intelligence. Methods For the creation of training and validation sets, 47,555 images in the jpg format were extracted from colonoscopy videos for 24 patients in Korea University Anam Hospital. A gastroenterologist with the clinical experience of 15 years divided the 47,555 images into three classes of Normal (25,895), Unclear (2038) and Tumor (19,622). A single shot detector, a deep learning framework designed for object detection, was trained using the 47,255 images and validated with two sets of 300 images—each validation set included 150 images (50 normal, 50 unclear and 50 tumor cases). Half of the 47,255 images were used for building the model and the other half were used for testing the model. The learning rate of the model was 0.0001 during 250 epochs (training cycles). Results The average accuracy, precision, recall, and F1 score over the category were 0.9067, 0.9744, 0.9067 and 0.9393, respectively. These performance measures had no change with respect to the intersection-over-union threshold (0.45, 0.50, and 0.55). This finding suggests the stability of the model. Conclusion Automated detection of normal, unclear and tumor images from colonoscopy videos is possible by using a deep learning framework. This is expected to provide an invaluable decision supporting system for clinical experts.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2021-02-01
    Description: Background Generating and analysing single-cell data has become a widespread approach to examine tissue heterogeneity, and numerous algorithms exist for clustering these datasets to identify putative cell types with shared transcriptomic signatures. However, many of these clustering workflows rely on user-tuned parameter values, tailored to each dataset, to identify a set of biologically relevant clusters. Whereas users often develop their own intuition as to the optimal range of parameters for clustering on each data set, the lack of systematic approaches to identify this range can be daunting to new users of any given workflow. In addition, an optimal parameter set does not guarantee that all clusters are equally well-resolved, given the heterogeneity in transcriptomic signatures in most biological systems. Results Here, we illustrate a subsampling-based approach (chooseR) that simultaneously guides parameter selection and characterizes cluster robustness. Through bootstrapped iterative clustering across a range of parameters, chooseR was used to select parameter values for two distinct clustering workflows (Seurat and scVI). In each case, chooseR identified parameters that produced biologically relevant clusters from both well-characterized (human PBMC) and complex (mouse spinal cord) datasets. Moreover, it provided a simple “robustness score” for each of these clusters, facilitating the assessment of cluster quality. Conclusion chooseR is a simple, conceptually understandable tool that can be used flexibly across clustering algorithms, workflows, and datasets to guide clustering parameter selection and characterize cluster robustness.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2021-03-31
    Description: Background Diabetes is a medical and economic burden in the United States. In this study, a machine learning predictive model was developed to predict unplanned medical visits among patients with diabetes, and findings were used to design a clinical intervention in the sponsoring healthcare organization. This study presents a case study of how predictive analytics can inform clinical actions, and describes practical factors that must be incorporated in order to translate research into clinical practice. Methods Data were drawn from electronic medical records (EMRs) from a large healthcare organization in the Northern Plains region of the US, from adult (≥ 18 years old) patients with type 1 or type 2 diabetes who received care at least once during the 3-year period. A variety of machine-learning classification models were run using standard EMR variables as predictors (age, body mass index (BMI), systolic blood pressure (BP), diastolic BP, low-density lipoprotein, high-density lipoprotein (HDL), glycohemoglobin (A1C), smoking status, number of diagnoses and number of prescriptions). The best-performing model after cross-validation testing was analyzed to identify strongest predictors. Results The best-performing model was a linear-basis support vector machine, which achieved a balanced accuracy (average of sensitivity and specificity) of 65.7%. This model outperformed a conventional logistic regression by 0.4 percentage points. A sensitivity analysis identified BP and HDL as the strongest predictors, such that disrupting these variables with random noise decreased the model’s overall balanced accuracy by 1.3 and 1.4 percentage points, respectively. These recommendations, along with stakeholder engagement, behavioral economics strategies, and implementation science principles helped to inform the design of a clinical intervention targeting behavioral changes. Conclusion Our machine-learning predictive model more accurately predicted unplanned medical visits among patients with diabetes, relative to conventional models. Post-hoc analysis of the model was used for hypothesis generation, namely that HDL and BP are the strongest contributors to unplanned medical visits among patients with diabetes. These findings were translated into a clinical intervention now being piloted at the sponsoring healthcare organization. In this way, this predictive model can be used in moving from prediction to implementation and improved diabetes care management in clinical settings.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2021-03-31
    Description: Background The most common measure of association between two continuous variables is the Pearson correlation (Maronna et al. in Safari an OMC. Robust statistics, 2019. https://login.proxy.bib.uottawa.ca/login?url=https://learning.oreilly.com/library/view/-/9781119214687/?ar&orpq&email=^u). When outliers are present, Pearson does not accurately measure association and robust measures are needed. This article introduces three new robust measures of correlation: Taba (T), TabWil (TW), and TabWil rank (TWR). The correlation estimators T and TW measure a linear association between two continuous or ordinal variables; whereas TWR measures a monotonic association. The robustness of these proposed measures in comparison with Pearson (P), Spearman (S), Quadrant (Q), Median (M), and Minimum Covariance Determinant (MCD) are examined through simulation. Taba distance is used to analyze genes, and statistical tests were used to identify those genes most significantly associated with Williams Syndrome (WS). Results Based on the root mean square error (RMSE) and bias, the three proposed correlation measures are highly competitive when compared to classical measures such as P and S as well as robust measures such as Q, M, and MCD. Our findings indicate TBL2 was the most significant gene among patients diagnosed with WS and had the most significant reduction in gene expression level when compared with control (P value = 6.37E-05). Conclusions Overall, when the distribution is bivariate Log-Normal or bivariate Weibull, TWR performs best in terms of bias and T performs best with respect to RMSE. Under the Normal distribution, MCD performs well with respect to bias and RMSE; but TW, TWR, T, S, and P correlations were in close proximity. The identification of TBL2 may serve as a diagnostic tool for WS patients. A Taba R package has been developed and is available for use to perform all necessary computations for the proposed methods.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2021-03-31
    Description: Background Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein’s function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. Method We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. Results In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN. Conclusions The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2021-03-24
    Description: Background Given expression data, gene regulatory network(GRN) inference approaches try to determine regulatory relations. However, current inference methods ignore the inherent topological characters of GRN to some extent, leading to structures that lack clear biological explanation. To increase the biophysical meanings of inferred networks, this study performed data-driven module detection before network inference. Gene modules were identified by decomposition-based methods. Results ICA-decomposition based module detection methods have been used to detect functional modules directly from transcriptomic data. Experiments about time-series expression, curated and scRNA-seq datasets suggested that the advantages of the proposed ModularBoost method over established methods, especially in the efficiency and accuracy. For scRNA-seq datasets, the ModularBoost method outperformed other candidate inference algorithms. Conclusions As a complicated task, GRN inference can be decomposed into several tasks of reduced complexity. Using identified gene modules as topological constraints, the initial inference problem can be accomplished by inferring intra-modular and inter-modular interactions respectively. Experimental outcomes suggest that the proposed ModularBoost method can improve the accuracy and efficiency of inference algorithms by introducing topological constraints.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2021-03-27
    Description: Background Heritability is a central measure in genetics quantifying how much of the variability observed in a trait is attributable to genetic differences. Existing methods for estimating heritability are most often based on random-effect models, typically for computational reasons. The alternative of using a fixed-effect model has received much more limited attention in the literature. Results In this paper, we propose a generic strategy for heritability inference, termed as “boosting heritability”, by combining the advantageous features of different recent methods to produce an estimate of the heritability with a high-dimensional linear model. Boosting heritability uses in particular a multiple sample splitting strategy which leads in general to a stable and accurate estimate. We use both simulated data and real antibiotic resistance data from a major human pathogen, Sptreptococcus pneumoniae, to demonstrate the attractive features of our inference strategy. Conclusions Boosting is shown to offer a reliable and practically useful tool for inference about heritability.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2021-03-25
    Description: Background A growing body of evidence has shown the association between tuberculosis (TB) infection and lung cancer. However, the possible effect of strain‐specific behavior of Mycobacterium tuberculosis (M.tb) population, the etiological agent of TB infection in this association has been neglected. In this context, this study was conducted to investigate this association with consideration of the genetic background of strains in the M.tb population. Results We employed the elastic net penalized logistic regression model, as a statistical-learning algorithm for gene selection, to evaluate this association in 129 genes involved in TLRs and NF-κB signaling pathways in response to two different M.tb sub-lineage strains (L3-CAS1and L 4.5). Of the 129 genes, 21 were found to be associated with the two studied M.tb sub-lineages. In addition, MAPK8IP3 gene was identified as a novel gene, which has not been reported in previous lung cancer studies and may have the potential to be recognized as a novel biomarker in lung cancer investigation. Conclusions This preliminary study provides new insights into the mechanistic association between TB infection and lung cancer. Further mechanistic investigations of this association with a large number of M.tb strains, encompassing the other main M.tb lineages and using the whole transcriptome of the host cell are inevitable.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2021-03-22
    Description: Background Telomeres, nucleoprotein structures comprising short tandem repeats and delimiting the ends of linear eukaryotic chromosomes, play an important role in the maintenance of genome stability. Therefore, the determination of the length of telomeres is of high importance for many studies. Over the last years, new methods for the analysis of the length of telomeres have been developed, including those based on PCR or analysis of NGS data. Despite that, terminal restriction fragment (TRF) method remains the gold standard to this day. However, this method lacks universally accepted and precise tool capable to analyse and statistically evaluate TRF results. Results To standardize the processing of TRF results, we have developed WALTER, an online toolset allowing rapid, reproducible, and user-friendly analysis including statistical evaluation of the data. Given its web-based nature, it provides an easily accessible way to analyse TRF data without any need to install additional software. Conclusions WALTER represents a major upgrade from currently available tools for the image processing of TRF scans. This toolset enables a rapid, highly reproducible, and user-friendly evaluation of almost any TRF scan including in-house statistical evaluation of the data. WALTER platform together with user manual describing the evaluation of TRF scans in detail and presenting tips and troubleshooting, as well as test data to demo the software are available at https://www.ceitec.eu/chromatin-molecular-complexes-jiri-fajkus/rg51/tab?tabId=125#WALTER and the source code at https://github.com/mlyc93/WALTER.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2021-03-22
    Description: Background Polyploidy is very common in plants and can be seen as one of the key drivers in the domestication of crops and the establishment of important agronomic traits. It can be the main source of genomic repatterning and introduces gene duplications, affecting gene expression and alternative splicing. Since fully sequenced genomes are not yet available for many plant species including crops, de novo transcriptome assembly is the basis to understand molecular and functional mechanisms. However, in complex polyploid plants, de novo transcriptome assembly is challenging, leading to increased rates of fused or redundant transcripts. Since assemblers were developed mainly for diploid organisms, they may not well suited for polyploids. Also, comparative evaluations of these tools on higher polyploid plants are extremely rare. Thus, our aim was to fill this gap and to provide a basic guideline for choosing the optimal de novo assembly strategy focusing on autotetraploids, as the scientific interest in this type of polyploidy is steadily increasing. Results We present a comparison of two common (SOAPdenovo-Trans, Trinity) and one recently published transcriptome assembler (TransLiG) on diploid and autotetraploid species of the genera Acer and Vaccinium using Arabidopsis thaliana as a reference. The number of assembled transcripts was up to 11 and 14 times higher with an increased number of short transcripts for Acer and Vaccinium, respectively, compared to A. thaliana. In diploid samples, Trinity and TransLiG performed similarly good while in autotetraploids, TransLiG assembled most complete transcriptomes with an average of 1916 assembled BUSCOs vs. 1705 BUSCOs for Trinity. Of all three assemblers, SOAPdenovo-Trans performed worst (1133 complete BUSCOs). Conclusion All three assembly tools produced complete assemblies when dealing with the model organism A. thaliana, independently of its ploidy level, but their performances differed extremely when it comes to non-model autotetraploids, where specifically TransLiG and Trinity produced a high number of redundant transcripts. The recently published assembler TransLiG has not been tested yet on any plant organism but showed highest completeness and full-length transcriptomes, especially in autotetraploids. Including such species during the development and testing of new assembly tools is highly appreciated and recommended as many important crops are polyploid.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2021-03-24
    Description: Background A number of predictive models for aquatic toxicity are available, however, the accuracy and extent of easy to use of these in silico tools in risk assessment still need further studied. This study evaluated the performance of seven in silico tools to daphnia and fish: ECOSAR, T.E.S.T., Danish QSAR Database, VEGA, KATE, Read Across and Trent Analysis. 37 Priority Controlled Chemicals in China (PCCs) and 92 New Chemicals (NCs) were used as validation dataset. Results In the quantitative evaluation to PCCs with the criteria of 10-fold difference between experimental value and estimated value, the accuracies of VEGA is the highest among all of the models, both in prediction of daphnia and fish acute toxicity, with accuracies of 100% and 90% after considering AD, respectively. The performance of KATE, ECOSAR and T.E.S.T. is similar, with accuracies are slightly lower than VEGA. The accuracy of Danish Q.D. is the lowest among the above tools with which QSAR is the main mechanism. The performance of Read Across and Trent Analysis is lowest among all of the tested in silico tools. The predictive ability of models to NCs was lower than that of PCCs possibly because never appeared in training set of the models, and ECOSAR perform best than other in silico tools. Conclusion QSAR based in silico tools had the greater prediction accuracy than category approach (Read Across and Trent Analysis) in predicting the acute toxicity of daphnia and fish. Category approach (Read Across and Trent Analysis) requires expert knowledge to be utilized effectively. ECOSAR performs well in both PCCs and NCs, and the application shoud be promoted in both risk assessment and priority activities. We suggest that distribution of multiple data and water solubility should be considered when developing in silico models. Both more intelligent in silico tools and testing are necessary to identify hazards of Chemicals.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2021-02-02
    Description: Background Transposable elements (TEs) constitute a significant part of eukaryotic genomes. Short interspersed nuclear elements (SINEs) are non-autonomous TEs, which are widely represented in mammalian genomes and also found in plants. After insertion in a new position in the genome, TEs quickly accumulate mutations, which complicate their identification and annotation by modern bioinformatics methods. In this study, we searched for highly divergent SINE copies in the genome of rice (Oryza sativa subsp. japonica) using the Highly Divergent Repeat Search Method (HDRSM). Results The HDRSM considers correlations of neighboring symbols to construct position weight matrix (PWM) for a SINE family, which is then used to perform a search for new copies. In order to evaluate the accuracy of the method and compare it with the RepeatMasker program, we generated a set of SINE copies containing nucleotide substitutions and indels and inserted them into an artificial chromosome for analysis. The HDRSM showed better results both in terms of the number of identified inserted repeats and the accuracy of determining their boundaries. A search for the copies of 39 SINE families in the rice genome produced 14,030 hits; among them, 5704 were not detected by RepeatMasker. Conclusions The HDRSM could find divergent SINE copies, correctly determine their boundaries, and offer a high level of statistical significance. We also found that RepeatMasker is able to find relatively short copies of the SINE families with a higher level of similarity, while HDRSM is able to find more diverged copies. To obtain a comprehensive profile of SINE distribution in the genome, combined application of the HDRSM and RepeatMasker is recommended.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2021-02-02
    Description: Background Due to the need for informatics competencies in the field of nursing, the present study was conducted to design a psychometric instrument to determine the qualification of informatics competencies of employed nurses in educational care centers. Methods The questionnaire was made by reviewing existing scientific resources and assessment tools. Two hundred nurses were selected using simple random sampling. Structural equation modeling was used using the measurement model technique and the average variance was calculated. Linear structural relations (LISREL) software was used to test the assumptions and correlations of the model. Results Findings showed relatively good estimation in the fit of first-order measurement model. The informatics knowledge subscale with a determining rate of 0.90 had the greatest explanatory effect among the subscales and informatics skill with a determining rate of 0.67 and basic computer skill with a determining rate of 0.60 were observed. The second-order measurement model of fitness indicators showed that the three factors can well explain the multidimensional construct of informatics competency. Conclusions The designed tool can be used to develop educational strategies in relation to nursing students in the field of informatics and prepare them in the rich environment of information technology, which can be helpful in training nursing instructors.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2021-02-02
    Description: Background Questionnaires are commonly used tools in telemedicine services that can help to evaluate different aspects. Selecting the ideal questionnaire for this purpose may be challenging for researchers. This study aims to review which questionnaires are used to evaluate telemedicine services in the studies, which are most common, and what aspects of telemedicine evaluation do they capture. Methods The PubMed database was searched in August 2020 to retrieve articles. Data extracted from the final list of articles included author/year of publication, journal of publication, type of evaluation, and evaluation questionnaire. Data were analyzed using descriptive statistics. Results Fifty-three articles were included in this study. The questionnaire was used for evaluating the satisfaction (49%), usability (34%), acceptance (11.5%), and implementation (2%) of telemedicine services. Among telemedicine specific questionnaires, Telehealth Usability Questionnaire (TUQ) (19%), Telemedicine Satisfaction Questionnaire (TSQ) (13%), and Service User Technology Acceptability Questionnaire (SUTAQ) (5.5%), were respectively most frequently used in the collected articles. Other most used questionnaires generally used for evaluating the users’ satisfaction, usability, and acceptance of technology were Client Satisfaction Questionnaire (CSQ) (5.5%), Questionnaire for User Interaction Satisfaction (QUIS) (5.5%), System Usability Scale (SUS) (5.5%), Patient Satisfaction Questionnaire (PSQ) (5.5%), and Technology Acceptance Model (TAM) (3.5%) respectively. Conclusion Employing specifically designed questionnaires or designing a new questionnaire with fewer questions and more comprehensiveness in terms of the issues studied provides a better evaluation. Attention to user needs, end-user acceptance, and implementation processes, along with users' satisfaction and usability evaluation, may optimize telemedicine efforts in the future.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2021-03-22
    Description: Background Recently, many computational methods have been proposed to predict cancer genes. One typical kind of method is to find the differentially expressed genes between tumour and normal samples. However, there are also some genes, for example, ‘dark’ genes, that play important roles at the network level but are difficult to find by traditional differential gene expression analysis. In addition, network controllability methods, such as the minimum feedback vertex set (MFVS) method, have been used frequently in cancer gene prediction. However, the weights of vertices (or genes) are ignored in the traditional MFVS methods, leading to difficulty in finding the optimal solution because of the existence of many possible MFVSs. Results Here, we introduce a novel method, called weighted MFVS (WMFVS), which integrates the gene differential expression value with MFVS to select the maximum-weighted MFVS from all possible MFVSs in a protein interaction network. Our experimental results show that WMFVS achieves better performance than using traditional bio-data or network-data analyses alone. Conclusion This method balances the advantage of differential gene expression analyses and network analyses, improves the low accuracy of differential gene expression analyses and decreases the instability of pure network analyses. Furthermore, WMFVS can be easily applied to various kinds of networks, providing a useful framework for data analysis and prediction.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2021-03-22
    Description: Background Storage of genomic data is a major cost for the Life Sciences, effectively addressed via specialized data compression methods. For the same reasons of abundance in data production, the use of Big Data technologies is seen as the future for genomic data storage and processing, with MapReduce-Hadoop as leaders. Somewhat surprisingly, none of the specialized FASTA/Q compressors is available within Hadoop. Indeed, their deployment there is not exactly immediate. Such a State of the Art is problematic. Results We provide major advances in two different directions. Methodologically, we propose two general methods, with the corresponding software, that make very easy to deploy a specialized FASTA/Q compressor within MapReduce-Hadoop for processing files stored on the distributed Hadoop File System, with very little knowledge of Hadoop. Practically, we provide evidence that the deployment of those specialized compressors within Hadoop, not available so far, results in better space savings, and even in better execution times over compressed data, with respect to the use of generic compressors available in Hadoop, in particular for FASTQ files. Finally, we observe that these results hold also for the Apache Spark framework, when used to process FASTA/Q files stored on the Hadoop File System. Conclusions Our Methods and the corresponding software substantially contribute to achieve space and time savings for the storage and processing of FASTA/Q files in Hadoop and Spark. Being our approach general, it is very likely that it can be applied also to FASTA/Q compression methods that will appear in the future. Availability The software and the datasets are available at https://github.com/fpalini/fastdoopc
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2021-03-23
    Description: Background A common approach for sequencing studies is to do joint-calling and store variants of all samples in a single file. If new samples are continually added or controls are re-used for several studies, the cost and time required to perform joint-calling for each analysis can become prohibitive. Results We present ATAV, an analysis platform for large-scale whole-exome and whole-genome sequencing projects. ATAV stores variant and per site coverage data for all samples in a centralized database, which is efficiently queried by ATAV to support diagnostic analyses for trios and singletons, as well as rare-variant collapsing analyses for finding disease associations in complex diseases. Runtime logs ensure full reproducibility and the modularized ATAV framework makes it extensible to continuous development. Besides helping with the identification of disease-causing variants for a range of diseases, ATAV has also enabled the discovery of disease-genes by rare-variant collapsing on datasets containing more than 20,000 samples. Analyses to date have been performed on data of more than 110,000 individuals demonstrating the scalability of the framework. To allow users to easily access variant-level data directly from the database, we provide a web-based interface, the ATAV data browser (http://atavdb.org/). Through this browser, summary-level data for more than 40,000 samples can be queried by the general public representing a mix of cases and controls of diverse ancestries. Users have access to phenotype categories of variant carriers, as well as predicted ancestry, gender, and quality metrics. In contrast to many other platforms, the data browser is able to show data of newly-added samples in real-time and therefore evolves rapidly as more and more samples are sequenced. Conclusions Through ATAV, users have public access to one of the largest variant databases for patients sequenced at a tertiary care center and can look up any genes or variants of interest. Additionally, since the entire code is freely available on GitHub, ATAV can easily be deployed by other groups that wish to build their own platform, database, and user interface.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2021-03-23
    Description: Background Currently, no proven effective drugs for the novel coronavirus disease COVID-19 exist and despite widespread vaccination campaigns, we are far short from herd immunity. The number of people who are still vulnerable to the virus is too high to hamper new outbreaks, leading a compelling need to find new therapeutic options devoted to combat SARS-CoV-2 infection. Drug repurposing represents an effective drug discovery strategy from existing drugs that could shorten the time and reduce the cost compared to de novo drug discovery. Results We developed a network-based tool for drug repurposing provided as a freely available R-code, called SAveRUNNER (Searching off-lAbel dRUg aNd NEtwoRk), with the aim to offer a promising framework to efficiently detect putative novel indications for currently marketed drugs against diseases of interest. SAveRUNNER predicts drug–disease associations by quantifying the interplay between the drug targets and the disease-associated proteins in the human interactome through the computation of a novel network-based similarity measure, which prioritizes associations between drugs and diseases located in the same network neighborhoods. Conclusions The algorithm was successfully applied to predict off-label drugs to be repositioned against the new human coronavirus (2019-nCoV/SARS-CoV-2), and it achieved a high accuracy in the identification of well-known drug indications, thus revealing itself as a powerful tool to rapidly detect potential novel medical indications for various drugs that are worth of further investigation. SAveRUNNER source code is freely available at https://github.com/giuliafiscon/SAveRUNNER.git, along with a comprehensive user guide.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2021-03-25
    Description: Background Synthetic long reads (SLR) with long-range co-barcoding information are now widely applied in genomics research. Although several tools have been developed for each specific SLR technique, a robust standalone scaffolder with high efficiency is warranted for hybrid genome assembly. Results In this work, we developed a standalone scaffolding tool, SLR-superscaffolder, to link together contigs in draft assemblies using co-barcoding and paired-end read information. Our top-to-bottom scheme first builds a global scaffold graph based on Jaccard Similarity to determine the order and orientation of contigs, and then locally improves the scaffolds with the aid of paired-end information. We also exploited a screening algorithm to reduce the negative effect of misassembled contigs in the input assembly. We applied SLR-superscaffolder to a human single tube long fragment read sequencing dataset and increased the scaffold NG50 of its corresponding draft assembly 1349 fold. Moreover, benchmarking on different input contigs showed that this approach overall outperformed existing SLR scaffolders, providing longer contiguity and fewer misassemblies, especially for short contigs assembled by next-generation sequencing data. The open-source code of SLR-superscaffolder is available at https://github.com/BGI-Qingdao/SLR-superscaffolder. Conclusions SLR-superscaffolder can dramatically improve the contiguity of a draft assembly by integrating a hybrid assembly strategy.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2021-03-30
    Description: Background Hepatocellular carcinoma (HCC), derived from hepatocytes, is the main histological subtype of primary liver cancer and poses a serious threat to human health due to the high incidence and poor prognosis. This study aimed to establish a multigene prognostic model to predict the prognosis of patients with HCC. Results Gene expression datasets (GSE121248, GSE40873, GSE62232) were used to identify differentially expressed genes (DEGs) between tumor and adjacent or normal tissues, and then hub genes were screened by protein–protein interaction (PPI) network and Cytoscape software. Seventeen genes among hub genes were significantly associated with prognosis and used to construct a prognostic model through COX hazard regression analysis. The predictive performance of this model was evaluated with TCGA data and was further validated with independent dataset GSE14520. Six genes (CDKN3, ZWINT, KIF20A, NUSAP1, HMMR, DLGAP5) were involved in the prognostic model, which separated HCC patients from TCGA dataset into high- and low-risk groups. Kaplan–Meier (KM) survival analysis and risk score analysis demonstrated that low-risk group represented a survival advantage. Univariate and multivariate regression analysis showed risk score could be an independent prognostic factor. The receiver operating characteristic (ROC) curve showed there was a better predictive power of the risk score than that of other clinical indicators. At last, the results from GSE14520 demonstrated the reliability of this prognostic model in some extent. Conclusion This prognostic model represented significance for prognosis of HCC, and the risk score according to this model may be a better prognostic factor than other traditional clinical indicators.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2021-03-25
    Description: Background As the use of nanopore sequencing for metagenomic analysis increases, tools capable of performing long-read taxonomic classification (ie. determining the composition of a sample) in a fast and accurate manner are needed. Existing tools were either designed for short-read data (eg. Centrifuge), take days to analyse modern sequencer outputs (eg. MetaMaps) or suffer from suboptimal accuracy (eg. CDKAM). Additionally, all tools require command line expertise and do not scale in the cloud. Results We present BugSeq, a novel, highly accurate metagenomic classifier for nanopore reads. We evaluate BugSeq on simulated data, mock microbial communities and real clinical samples. On the ZymoBIOMICS Even and Log communities, BugSeq (F1 = 0.95 at species level) offers better read classification than MetaMaps (F1 = 0.89–0.94) in a fraction of the time. BugSeq significantly improves on the accuracy of Centrifuge (F1 = 0.79–0.93) and CDKAM (F1 = 0.91–0.94) while offering competitive run times. When applied to 41 samples from patients with lower respiratory tract infections, BugSeq produces greater concordance with microbiological culture and qPCR compared with “What’s In My Pot” analysis. Conclusion BugSeq is deployed to the cloud for easy and scalable long-read metagenomic analyses. BugSeq is freely available for non-commercial use at https://bugseq.com/free.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2021-03-24
    Description: Background Recent studies have confirmed that N7-methylguanosine (m7G) modification plays an important role in regulating various biological processes and has associations with multiple diseases. Wet-lab experiments are cost and time ineffective for the identification of disease-associated m7G sites. To date, tens of thousands of m7G sites have been identified by high-throughput sequencing approaches and the information is publicly available in bioinformatics databases, which can be leveraged to predict potential disease-associated m7G sites using a computational perspective. Thus, computational methods for m7G-disease association prediction are urgently needed, but none are currently available at present. Results To fill this gap, we collected association information between m7G sites and diseases, genomic information of m7G sites, and phenotypic information of diseases from different databases to build an m7G-disease association dataset. To infer potential disease-associated m7G sites, we then proposed a heterogeneous network-based model, m7G Sites and Diseases Associations Inference (m7GDisAI) model. m7GDisAI predicts the potential disease-associated m7G sites by applying a matrix decomposition method on heterogeneous networks which integrate comprehensive similarity information of m7G sites and diseases. To evaluate the prediction performance, 10 runs of tenfold cross validation were first conducted, and m7GDisAI got the highest AUC of 0.740(± 0.0024). Then global and local leave-one-out cross validation (LOOCV) experiments were implemented to evaluate the model’s accuracy in global and local situations respectively. AUC of 0.769 was achieved in global LOOCV, while 0.635 in local LOOCV. A case study was finally conducted to identify the most promising ovarian cancer-related m7G sites for further functional analysis. Gene Ontology (GO) enrichment analysis was performed to explore the complex associations between host gene of m7G sites and GO terms. The results showed that m7GDisAI identified disease-associated m7G sites and their host genes are consistently related to the pathogenesis of ovarian cancer, which may provide some clues for pathogenesis of diseases. Conclusion The m7GDisAI web server can be accessed at http://180.208.58.66/m7GDisAI/, which provides a user-friendly interface to query disease associated m7G. The list of top 20 m7G sites predicted to be associted with 177 diseases can be achieved. Furthermore, detailed information about specific m7G sites and diseases are also shown.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2021-03-10
    Description: Background Correlation network analysis has become an integral tool to study metabolite datasets. Networks are constructed by omitting correlations between metabolites based on two thresholds—namely the r and the associated p-values. While p-value threshold settings follow the rules of multiple hypotheses testing correction, guidelines for r-value threshold settings have not been defined. Results Here, we introduce a method that allows determining the r-value threshold based on an iterative approach, where different networks are constructed and their network topology is monitored. Once the network topology changes significantly, the threshold is set to the corresponding correlation coefficient value. The approach was exemplified on: (i) a metabolite and morphological trait dataset from a potato association panel, which was grown under normal irrigation and water recovery conditions; and validated (ii) on a metabolite dataset of hearts of fed and fasted mice. For the potato normal irrigation correlation network a threshold of Pearson’s |r|≥ 0.23 was suggested, while for the water recovery correlation network a threshold of Pearson’s |r|≥ 0.41 was estimated. For both mice networks the threshold was calculated with Pearson’s |r|≥ 0.84. Conclusions Our analysis corrected the previously stated Pearson’s correlation coefficient threshold from 0.4 to 0.41 in the water recovery network and from 0.4 to 0.23 for the normal irrigation network. Furthermore, the proposed method suggested a correlation threshold of 0.84 for both mice networks rather than a threshold of 0.7 as applied earlier. We demonstrate that the proposed approach is a valuable tool for constructing biological meaningful networks.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2021-03-10
    Description: Background Clinical Decision Support Systems (CDSSs) for Prescribing are one of the innovations designed to improve physician practice performance and patient outcomes by reducing prescription errors. This study was therefore conducted to examine the effects of various CDSSs on physician practice performance and patient outcomes. Methods This systematic review was carried out by searching PubMed, Embase, Web of Science, Scopus, and Cochrane Library from 2005 to 2019. The studies were independently reviewed by two researchers. Any discrepancies in the eligibility of the studies between the two researchers were then resolved by consulting the third researcher. In the next step, we performed a meta-analysis based on medication subgroups, CDSS-type subgroups, and outcome categories. Also, we provided the narrative style of the findings. In the meantime, we used a random-effects model to estimate the effects of CDSS on patient outcomes and physician practice performance with a 95% confidence interval. Q statistics and I2 were then used to calculate heterogeneity. Results On the basis of the inclusion criteria, 45 studies were qualified for analysis in this study. CDSS for prescription drugs/COPE has been used for various diseases such as cardiovascular diseases, hypertension, diabetes, gastrointestinal and respiratory diseases, AIDS, appendicitis, kidney disease, malaria, high blood potassium, and mental diseases. In the meantime, other cases such as concurrent prescribing of multiple medications for patients and their effects on the above-mentioned results have been analyzed. The study shows that in some cases the use of CDSS has beneficial effects on patient outcomes and physician practice performance (std diff in means = 0.084, 95% CI 0.067 to 0.102). It was also statistically significant for outcome categories such as those demonstrating better results for physician practice performance and patient outcomes or both. However, there was no significant difference between some other cases and traditional approaches. We assume that this may be due to the disease type, the quantity, and the type of CDSS criteria that affected the comparison. Overall, the results of this study show positive effects on performance for all forms of CDSSs. Conclusions Our results indicate that the positive effects of the CDSS can be due to factors such as user-friendliness, compliance with clinical guidelines, patient and physician cooperation, integration of electronic health records, CDSS, and pharmaceutical systems, consideration of the views of physicians in assessing the importance of CDSS alerts, and the real-time alerts in the prescription.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2021-03-18
    Description: Background Historical and updated information provided by time-course data collected during an entire treatment period proves to be more useful than information provided by single-point data. Accurate predictions made using time-course data on multiple biomarkers that indicate a patient’s response to therapy contribute positively to the decision-making process associated with designing effective treatment programs for various diseases. Therefore, the development of prediction methods incorporating time-course data on multiple markers is necessary. Results We proposed new methods that may be used for prediction and gene selection via time-course gene expression profiles. Our prediction method consolidated multiple probabilities calculated using gene expression profiles collected over a series of time points to predict therapy response. Using two data sets collected from patients with hepatitis C virus (HCV) infection and multiple sclerosis (MS), we performed numerical experiments that predicted response to therapy and evaluated their accuracies. Our methods were more accurate than conventional methods and successfully selected genes, the functions of which were associated with the pathology of HCV infection and MS. Conclusions The proposed method accurately predicted response to therapy using data at multiple time points. It showed higher accuracies at early time points compared to those of conventional methods. Furthermore, this method successfully selected genes that were directly associated with diseases.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2021-03-18
    Description: Background Patients with complex health care needs may suffer adverse outcomes from fragmented and delayed care, reducing well-being and increasing health care costs. Health reform efforts, especially those in primary care, attempt to mitigate risk of adverse outcomes by better targeting resources to those most in need. However, predicting who is susceptible to adverse outcomes, such as unplanned hospitalizations, ED visits, or other potentially avoidable expenditures, can be difficult, and providing intensive levels of resources to all patients is neither wanted nor efficient. Our objective was to understand if primary care teams can predict patient risk better than standard risk scores. Methods Six primary care practices risk stratified their entire patient population over a 2-year period, and worked to mitigate risk for those at high risk through care management and coordination. Individual patient risk scores created by the practices were collected and compared to a common risk score (Hierarchical Condition Categories) in their ability to predict future expenditures, ED visits, and hospitalizations. Accuracy of predictions, sensitivity, positive predictive values (PPV), and c-statistics were calculated for each risk scoring type. Analyses were stratified by whether the practice used intuition alone, an algorithm alone, or adjudicated an algorithmic risk score. Results In all, 40,342 patients were risk stratified. Practice scores had 38.6% agreement with HCC scores on identification of high-risk patients. For the 3,381 patients with reliable outcomes data, accuracy was high (0.71–0.88) but sensitivity and PPV were low (0.16–0.40). Practice-created scores had 0.02–0.14 lower sensitivity, specificity and PPV compared to HCC in prediction of outcomes. Practices using adjudication had, on average, .16 higher sensitivity. Conclusions Practices using simple risk stratification techniques had slightly worse accuracy in predicting common outcomes than HCC, but adjudication improved prediction.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2021-03-18
    Description: Background The Ministry of Health in Saudi Arabia is expanding the country’s telemedicine services by using advanced technology in health services. In doing so, an e-health application (app), Seha, was introduced in 2018 that allows individuals to have face-to-face visual medical consultations with their doctors on their smartphones. Objective This study evaluated the effectiveness of the app in improving healthcare delivery by ensuring patient satisfaction with the care given, increasing access to care, and improving efficiency in the healthcare system. Methods A cross-sectional study design was used to assess the perceptions of users of the Seha app and non-users who continued with traditional health services. The data were collected using an online survey via Google Forms between June 2020 and September 2020. Independent t tests and chi-square (χ2) tests were conducted to answer the research questions. Results There was a significant difference between users and non-users in terms of ease of access to health services (t =  − 9.38, p 
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2021-03-13
    Description: Background Trauma-induced coagulopathy (TIC) is a disorder that occurs in one-third of severely injured trauma patients, manifesting as increased bleeding and a 4X risk of mortality. Understanding the mechanisms driving TIC, clinical risk factors are essential to mitigating this coagulopathic bleeding and is therefore essential for saving lives. In this retrospective, single hospital study of 891 trauma patients, we investigate and quantify how two prominently described phenotypes of TIC, consumptive coagulopathy and hyperfibrinolysis, affect survival odds in the first 25 h, when deaths from TIC are most prevalent. Methods We employ a joint survival model to estimate the longitudinal trajectories of the protein Factor II (% activity) and the log of the protein fragment D-Dimer ($$upmu$$ μ g/ml), representative biomarkers of consumptive coagulopathy and hyperfibrinolysis respectively, and tie them together with patient outcomes. Joint models have recently gained popularity in medical studies due to the necessity to simultaneously track continuously measured biomarkers as a disease evolves, as well as to associate them with patient outcomes. In this work, we estimate and analyze our joint model using Bayesian methods to obtain uncertainties and distributions over associations and trajectories. Results We find that a unit increase in log D-Dimer increases the risk of mortality by 2.22 [1.57, 3.28] fold while a unit increase in Factor II only marginally decreases the risk of mortality by 0.94 [0.91,0.96] fold. This suggests that, while managing consumptive coagulopathy and hyperfibrinolysis both seem to affect survival odds, the effect of hyperfibrinolysis is much greater and more sensitive. Furthermore, we find that the longitudinal trajectories, controlling for many fixed covariates, trend differently for different patients. Thus, a more personalized approach is necessary when considering treatment and risk prediction under these phenotypes. Conclusion This study reinforces the finding that hyperfibrinolysis is linked with poor patient outcomes regardless of factor consumption levels. Furthermore, it quantifies the degree to which measured D-Dimer levels correlate with increased risk. The single hospital, retrospective nature can be understood to specify the results to this particular hospital’s patients and protocol in treating trauma patients. Expanding to a multi-hospital setting would result in better estimates about the underlying nature of consumptive coagulopathy and hyperfibrinolysis with survival, regardless of protocol. Individual trajectories obtained with these estimates can be used to provide personalized dynamic risk prediction when making decisions regarding management of blood factors.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2021-02-17
    Description: Background Oocyte quality decreases with aging, thereby increasing errors in fertilization, chromosome segregation, and embryonic cleavage. Oocyte appearance also changes with aging, suggesting a functional relationship between oocyte quality and appearance. However, no methods are available to objectively quantify age-associated changes in oocyte appearance. Results We show that statistical image processing of Nomarski differential interference contrast microscopy images can be used to quantify age-associated changes in oocyte appearance in the nematode Caenorhabditis elegans. Max–min value (mean difference between the maximum and minimum intensities within each moving window) quantitatively characterized the difference in oocyte cytoplasmic texture between 1- and 3-day-old adults (Day 1 and Day 3 oocytes, respectively). With an appropriate parameter set, the gray level co-occurrence matrix (GLCM)-based texture feature Correlation (COR) more sensitively characterized this difference than the Max–min Value. Manipulating the smoothness of and/or adding irregular structures to the cytoplasmic texture of Day 1 oocyte images reproduced the difference in Max–min Value but not in COR between Day 1 and Day 3 oocytes. Increasing the size of granules in synthetic images recapitulated the age-associated changes in COR. Manual measurements validated that the cytoplasmic granules in oocytes become larger with aging. Conclusions The Max–min value and COR objectively quantify age-related changes in C. elegans oocyte in Nomarski DIC microscopy images. Our methods provide new opportunities for understanding the mechanism underlying oocyte aging.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2021-03-25
    Description: Background Poor balance has been cited as one of the key causal factors of falls. Timely detection of balance impairment can help identify the elderly prone to falls and also trigger early interventions to prevent them. The goal of this study was to develop a surrogate approach for assessing elderly’s functional balance based on Short Form Berg Balance Scale (SFBBS) score. Methods Data were collected from a waist-mounted tri-axial accelerometer while participants performed a timed up and go test. Clinically relevant variables were extracted from the segmented accelerometer signals for fitting SFBBS predictive models. Regularized regression together with random-shuffle-split cross-validation was used to facilitate the development of the predictive models for automatic balance estimation. Results Eighty-five community-dwelling older adults (72.12 ± 6.99 year) participated in our study. Our results demonstrated that combined clinical and sensor-based variables, together with regularized regression and cross-validation, achieved moderate-high predictive accuracy of SFBBS scores (mean MAE = 2.01 and mean RMSE = 2.55). Step length, gender, gait speed and linear acceleration variables describe the motor coordination were identified as significantly contributed variables of balance estimation. The predictive model also showed moderate-high discriminations in classifying the risk levels in the performance of three balance assessment motions in terms of AUC values of 0.72, 0.79 and 0.76 respectively. Conclusions The study presented a feasible option for quantitatively accurate, objectively measured, and unobtrusively collected functional balance assessment at the point-of-care or home environment. It also provided clinicians and elderly with stable and sensitive biomarkers for long-term monitoring of functional balance.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2021-03-25
    Description: Background Cumulative evidence from biological experiments has confirmed that miRNAs have significant roles to diagnose and treat complex diseases. However, traditional medical experiments have limitations in time-consuming and high cost so that they fail to find the unconfirmed miRNA and disease interactions. Thus, discovering potential miRNA-disease associations will make a contribution to the decrease of the pathogenesis of diseases and benefit disease therapy. Although, existing methods using different computational algorithms have favorable performances to search for the potential miRNA-disease interactions. We still need to do some work to improve experimental results. Results We present a novel combined embedding model to predict MiRNA-disease associations (CEMDA) in this article. The combined embedding information of miRNA and disease is composed of pair embedding and node embedding. Compared with the previous heterogeneous network methods that are merely node-centric to simply compute the similarity of miRNA and disease, our method fuses pair embedding to pay more attention to capturing the features behind the relative information, which models the fine-grained pairwise relationship better than the previous case when each node only has a single embedding. First, we construct the heterogeneous network from supported miRNA-disease pairs, disease semantic similarity and miRNA functional similarity. Given by the above heterogeneous network, we find all the associated context paths of each confirmed miRNA and disease. Meta-paths are linked by nodes and then input to the gate recurrent unit (GRU) to directly learn more accurate similarity measures between miRNA and disease. Here, the multi-head attention mechanism is used to weight the hidden state of each meta-path, and the similarity information transmission mechanism in a meta-path of miRNA and disease is obtained through multiple network layers. Second, pair embedding of miRNA and disease is fed to the multi-layer perceptron (MLP), which focuses on more important segments in pairwise relationship. Finally, we combine meta-path based node embedding and pair embedding with the cost function to learn and predict miRNA-disease association. The source code and data sets that verify the results of our research are shown at https://github.com/liubailong/CEMDA. Conclusions The performance of CEMDA in the leave-one-out cross validation and fivefold cross validation are 93.16% and 92.03%, respectively. It denotes that compared with other methods, CEMDA accomplishes superior performance. Three cases with lung cancers, breast cancers, prostate cancers and pancreatic cancers show that 48,50,50 and 50 out of the top 50 miRNAs, which are confirmed in HDMM V2.0. Thus, this further identifies the feasibility and effectiveness of our method.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2021-02-17
    Description: Background The electronic health record (EHR) holds the prospect of providing more complete and timely access to clinical information for biomedical research, quality assessments, and quality improvement compared to other data sources, such as administrative claims. In this study, we sought to assess the completeness and timeliness of structured diagnoses in the EHR compared to computed diagnoses for hypertension (HTN), hyperlipidemia (HLD), and diabetes mellitus (DM). Methods We determined the amount of time for a structured diagnosis to be recorded in the EHR from when an equivalent diagnosis could be computed from other structured data elements, such as vital signs and laboratory results. We used EHR data for encounters from January 1, 2012 through February 10, 2019 from an academic health system. Diagnoses for HTN, HLD, and DM were computed for patients with at least two observations above threshold separated by at least 30 days, where the thresholds were outpatient blood pressure of ≥ 140/90 mmHg, any low-density lipoprotein ≥ 130 mg/dl, or any hemoglobin A1c ≥ 6.5%, respectively. The primary measure was the length of time between the computed diagnosis and the time at which a structured diagnosis could be identified within the EHR history or problem list. Results We found that 39.8% of those with HTN, 21.6% with HLD, and 5.2% with DM did not receive a corresponding structured diagnosis recorded in the EHR. For those who received a structured diagnosis, a mean of 389, 198, and 166 days elapsed before the patient had the corresponding diagnosis of HTN, HLD, or DM, respectively, recorded in the EHR. Conclusions We found a marked temporal delay between when a diagnosis can be computed or inferred and when an equivalent structured diagnosis is recorded within the EHR. These findings demonstrate the continued need for additional study of the EHR to avoid bias when using observational data and reinforce the need for computational approaches to identify clinical phenotypes.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2021-02-02
    Description: Background Data from clinical registries may be linked to gain additional insights into disease processes, risk factors and outcomes. Identifying information varies from full names, addresses and unique identification codes to statistical linkage keys to no direct identifying information at all. A number of databases in Australia contain the statistical linkage key 581 (SLK-581). Our aim was to investigate the ability to link data using SLK-581 between two national databases, and to compare this linkage to that achieved with direct identifiers or other non-identifying variables. Methods The Australian and New Zealand Society of Cardiothoracic Surgeons database (ANZSCTS-CSD) contains fully identified data. The Australian and New Zealand Intensive Care Society database (ANZICS-APD) contains non-identified data together with SLK-581. Identifying data is removed at participating hospitals prior to central collation and storage. We used the local hospital ANZICS-APD data at a large single tertiary centre prior to deidentification and linked this to ANZSCTS-CSD data. We compared linkage using SLK-581 to linkage using non-identifying variables (dates of admission and discharge, age and sex) and linkage using a complete set of unique identifiers. We compared the rate of match, rate of mismatch and clinical characteristics between unmatched patients using the different methods. Results There were 1283 patients eligible for matching in the ANZSCTS-CSD. 1242 were matched using unique identifiers. Using non-identifying variables 1151/1242 (92.6%) patients were matched. Using SLK-581, 1202/1242 (96.7%) patients were matched. The addition of non-identifying data to SLK-581 provided few additional patients (1211/1242, 97.5%). Patients who did not match were younger, had a higher mortality risk and more non-standard procedures vs matched patients. The differences between unmatched patients using different matching strategies were small. Conclusion All strategies provided an acceptable linkage. SLK-581 improved the linkage compared to non-identifying variables, but was not as successful as direct identifiers. SLK-581 may be used to improve linkage between national registries where identifying information is not available or cannot be released.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2021-03-29
    Description: Background Exploring the relationship between disease and gene is of great significance for understanding the pathogenesis of disease and developing corresponding therapeutic measures. The prediction of disease-gene association by computational methods accelerates the process. Results Many existing methods cannot fully utilize the multi-dimensional biological entity relationship to predict disease-gene association due to multi-source heterogeneous data. This paper proposes FactorHNE, a factor graph-aggregated heterogeneous network embedding method for disease-gene association prediction, which captures a variety of semantic relationships between the heterogeneous nodes by factorization. It produces different semantic factor graphs and effectively aggregates a variety of semantic relationships, by using end-to-end multi-perspectives loss function to optimize model. Then it produces good nodes embedding to prediction disease-gene association. Conclusions Experimental verification and analysis show FactorHNE has better performance and scalability than the existing models. It also has good interpretability and can be extended to large-scale biomedical network data analysis.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2021-03-26
    Description: Background Coronavirus Disease 2019 (COVID-19) is a viral pandemic disease that may induce severe pneumonia in humans. In this paper, we investigated the putative implication of 12 vaccines, including BCG, OPV and MMR in the protection against COVID-19. Sequences of the main antigenic proteins in the investigated vaccines and SARS-CoV-2 proteins were compared to identify similar patterns. The immunogenic effect of identified segments was, then, assessed using a combination of structural and antigenicity prediction tools. Results A total of 14 highly similar segments were identified in the investigated vaccines. Structural and antigenicity prediction analysis showed that, among the identified patterns, three segments in Hepatitis B, Tetanus, and Measles proteins presented antigenic properties that can induce putative protective effect against COVID-19. Conclusions Our results suggest a possible protective effect of HBV, Tetanus and Measles vaccines against COVID-19, which may explain the variation of the disease severity among regions.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2021-03-30
    Description: Background Women are at more than 1.5-fold higher risk for clinically relevant adverse drug events. While this higher prevalence is partially due to gender-related effects, biological sex differences likely also impact drug response. Publicly available gene expression databases provide a unique opportunity for examining drug response at a cellular level. However, missingness and heterogeneity of metadata prevent large-scale identification of drug exposure studies and limit assessments of sex bias. To address this, we trained organism-specific models to infer sample sex from gene expression data, and used entity normalization to map metadata cell line and drug mentions to existing ontologies. Using this method, we inferred sex labels for 450,371 human and 245,107 mouse microarray and RNA-seq samples from refine.bio. Results Overall, we find slight female bias (52.1%) in human samples and (62.5%) male bias in mouse samples; this corresponds to a majority of mixed sex studies in humans and single sex studies in mice, split between female-only and male-only (25.8% vs. 18.9% in human and 21.6% vs. 31.1% in mouse, respectively). In drug studies, we find limited evidence for sex-sampling bias overall; however, specific categories of drugs, including human cancer and mouse nervous system drugs, are enriched in female-only and male-only studies, respectively. We leverage our expression-based sex labels to further examine the complexity of cell line sex and assess the frequency of metadata sex label misannotations (2–5%). Conclusions Our results demonstrate limited overall sex bias, while highlighting high bias in specific subfields and underscoring the importance of including sex labels to better understand the underlying biology. We make our inferred and normalized labels, along with flags for misannotated samples, publicly available to catalyze the routine use of sex as a study variable in future analyses.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2021-03-25
    Description: Background Deep immune receptor sequencing, RepSeq, provides unprecedented opportunities for identifying and studying condition-associated T-cell clonotypes, represented by T-cell receptor (TCR) CDR3 sequences. However, due to the immense diversity of the immune repertoire, identification of condition relevant TCR CDR3s from total repertoires has mostly been limited to either “public” CDR3 sequences or to comparisons of CDR3 frequencies observed in a single individual. A methodology for the identification of condition-associated TCR CDR3s by direct population level comparison of RepSeq samples is currently lacking. Results We present a method for direct population level comparison of RepSeq samples using immune repertoire sub-units (or sub-repertoires) that are shared across individuals. The method first performs unsupervised clustering of CDR3s within each sample. It then finds matching clusters across samples, called immune sub-repertoires, and performs statistical differential abundance testing at the level of the identified sub-repertoires. It finally ranks CDR3s in differentially abundant sub-repertoires for relevance to the condition. We applied the method on total TCR CDR3β RepSeq datasets of celiac disease patients, as well as on public datasets of yellow fever vaccination. The method successfully identified celiac disease associated CDR3β sequences, as evidenced by considerable agreement of TRBV-gene and positional amino acid usage patterns in the detected CDR3β sequences with previously known CDR3βs specific to gluten in celiac disease. It also successfully recovered significantly high numbers of previously known CDR3β sequences relevant to each condition than would be expected by chance. Conclusion We conclude that immune sub-repertoires of similar immuno-genomic features shared across unrelated individuals can serve as viable units of immune repertoire comparison, serving as proxy for identification of condition-associated CDR3s.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2021-03-25
    Description: Background Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation. Results We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “filtered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network filter may be applied to an entire system, or the system may be first decomposed into distinct modules and a different filter applied to each. Applied to synthetic data with known network structure and signal, network filters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network filtering prior to training increases accuracy up to 43% compared to using unfiltered data. Conclusions Network filters are a general way to denoise biological data and can account for both correlation and anti-correlation between different measurements. Furthermore, we find that partitioning a network prior to filtering can significantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diffusion based methods. Our results on proteomics data indicate the broad potential utility of network filters to applications in systems biology.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2021-03-25
    Description: Background Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. In this paper, we develop a two-stage model averaging procedure to enhance accuracy and stability in prediction for high-dimensional linear regression. First we employ a high-dimensional variable selection method such as LASSO to screen redundant predictors and construct a class of candidate models, then we apply the jackknife cross-validation to optimize model weights for averaging. Results In simulation studies, the proposed technique outperforms commonly used alternative methods under high-dimensional regression setting, in terms of minimizing the mean of the squared prediction error. We apply the proposed method to a riboflavin data, the result show that such method is quite efficient in forecasting the riboflavin production rate, when there are thousands of genes and only tens of subjects. Conclusions Compared with a recent high-dimensional model averaging procedure (Ando and Li in J Am Stat Assoc 109:254–65, 2014), the proposed approach enjoys three appealing features thus has better predictive performance: (1) More suitable methods are applied for model constructing and weighting. (2) Computational flexibility is retained since each candidate model and its corresponding weight are determined in the low-dimensional setting and the quadratic programming is utilized in the cross-validation. (3) Model selection and averaging are combined in the procedure thus it makes full use of the strengths of both techniques. As a consequence, the proposed method can achieve stable and accurate predictions in high-dimensional linear models, and can greatly help practical researchers analyze genetic data in medical research.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2021-03-25
    Description: Background Translation is a fundamental process in gene expression. Ribosome profiling is a method that enables the study of transcriptome-wide translation. A fundamental, technical challenge in analyzing Ribo-Seq data is identifying the A-site location on ribosome-protected mRNA fragments. Identification of the A-site is essential as it is at this location on the ribosome where a codon is translated into an amino acid. Incorrect assignment of a read to the A-site can lead to lower signal-to-noise ratio and loss of correlations necessary to understand the molecular factors influencing translation. Therefore, an easy-to-use and accurate analysis tool is needed to accurately identify the A-site locations. Results We present RiboA, a web application that identifies the most accurate A-site location on a ribosome-protected mRNA fragment and generates the A-site read density profiles. It uses an Integer Programming method that reflects the biological fact that the A-site of actively translating ribosomes is generally located between the second codon and stop codon of a transcript, and utilizes a wide range of mRNA fragment sizes in and around the coding sequence (CDS). The web application is containerized with Docker, and it can be easily ported across platforms. Conclusions The Integer Programming method that RiboA utilizes is the most accurate in identifying the A-site on Ribo-Seq mRNA fragments compared to other methods. RiboA makes it easier for the community to use this method via a user-friendly and portable web application. In addition, RiboA supports reproducible analyses by tracking all the input datasets and parameters, and it provides enhanced visualization to facilitate scientific exploration. RiboA is available as a web service at https://a-site.vmhost.psu.edu/. The code is publicly available at https://github.com/obrien-lab/aip_web_docker under the MIT license.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2021-03-17
    Description: Background Studies that examine the adoption of clinical decision support (CDS) by healthcare providers have generally lacked a theoretical underpinning. The Unified Theory of Acceptance and Use of Technology (UTAUT) model may provide such a theory-based explanation; however, it is unknown if the model can be applied to the CDS literature. Objective Our overall goal was to develop a taxonomy based on UTAUT constructs that could reliably characterize CDS interventions. Methods We used a two-step process: (1) identified randomized controlled trials meeting comparative effectiveness criteria, e.g., evaluating the impact of CDS interventions with and without specific features or implementation strategies; (2) iteratively developed and validated a taxonomy for characterizing differential CDS features or implementation strategies using three raters. Results Twenty-five studies with 48 comparison arms were identified. We applied three constructs from the UTAUT model and added motivational control to characterize CDS interventions. Inter-rater reliability was as follows for model constructs: performance expectancy (κ = 0.79), effort expectancy (κ = 0.85), social influence (κ = 0.71), and motivational control (κ = 0.87). Conclusion We found that constructs from the UTAUT model and motivational control can reliably characterize features and associated implementation strategies. Our next step is to examine the quantitative relationships between constructs and CDS adoption.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2021-03-15
    Description: Background Tissues are often heterogeneous in their single-cell molecular expression, and this can govern the regulation of cell fate. For the understanding of development and disease, it is important to quantify heterogeneity in a given tissue. Results We present the R package stochprofML which uses the maximum likelihood principle to parameterize heterogeneity from the cumulative expression of small random pools of cells. We evaluate the algorithm’s performance in simulation studies and present further application opportunities. Conclusion Stochastic profiling outweighs the necessary demixing of mixed samples with a saving in experimental cost and effort and less measurement error. It offers possibilities for parameterizing heterogeneity, estimating underlying pool compositions and detecting differences between cell populations between samples.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2021-03-09
    Description: Background We developed transformer-based deep learning models based on natural language processing for early risk assessment of Alzheimer’s disease from the picture description test. Methods The lack of large datasets poses the most important limitation for using complex models that do not require feature engineering. Transformer-based pre-trained deep language models have recently made a large leap in NLP research and application. These models are pre-trained on available large datasets to understand natural language texts appropriately, and are shown to subsequently perform well on classification tasks with small training sets. The overall classification model is a simple classifier on top of the pre-trained deep language model. Results The models are evaluated on picture description test transcripts of the Pitt corpus, which contains data of 170 AD patients with 257 interviews and 99 healthy controls with 243 interviews. The large bidirectional encoder representations from transformers (BERTLarge) embedding with logistic regression classifier achieves classification accuracy of 88.08%, which improves the state-of-the-art by 2.48%. Conclusions Using pre-trained language models can improve AD prediction. This not only solves the problem of lack of sufficiently large datasets, but also reduces the need for expert-defined features.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2021-03-09
    Description: Background Automated text classification has many important applications in the clinical setting; however, obtaining labelled data for training machine learning and deep learning models is often difficult and expensive. Active learning techniques may mitigate this challenge by reducing the amount of labelled data required to effectively train a model. In this study, we analyze the effectiveness of 11 active learning algorithms on classifying subsite and histology from cancer pathology reports using a Convolutional Neural Network as the text classification model. Results We compare the performance of each active learning strategy using two differently sized datasets and two different classification tasks. Our results show that on all tasks and dataset sizes, all active learning strategies except diversity-sampling strategies outperformed random sampling, i.e., no active learning. On our large dataset (15K initial labelled samples, adding 15K additional labelled samples each iteration of active learning), there was no clear winner between the different active learning strategies. On our small dataset (1K initial labelled samples, adding 1K additional labelled samples each iteration of active learning), marginal and ratio uncertainty sampling performed better than all other active learning techniques. We found that compared to random sampling, active learning strongly helps performance on rare classes by focusing on underrepresented classes. Conclusions Active learning can save annotation cost by helping human annotators efficiently and intelligently select which samples to label. Our results show that a dataset constructed using effective active learning techniques requires less than half the amount of labelled data to achieve the same performance as a dataset constructed using random sampling.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2021-03-09
    Description: Background Assessing the quality of healthcare data is a complex task including the selection of suitable measurement methods (MM) and adequately assessing their results. Objectives To present an interoperable data quality (DQ) assessment method that formalizes MMs based on standardized data definitions and intends to support collaborative governance of DQ-assessment knowledge, e.g. which MMs to apply and how to assess their results in different situations. Methods We describe and explain central concepts of our method using the example of its first real world application in a study on predictive biomarkers for rejection and other injuries of kidney transplants. We applied our open source tool—openCQA—that implements our method utilizing the openEHR specifications. Means to support collaborative governance of DQ-assessment knowledge are the version-control system git and openEHR clinical information models. Results Applying the method on the study’s dataset showed satisfactory practicability of the described concepts and produced useful results for DQ-assessment. Conclusions The main contribution of our work is to provide applicable concepts and a tested exemplary open source implementation for interoperable and knowledge-based DQ-assessment in healthcare that considers the need for flexible task and domain specific requirements.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2021-03-05
    Description: Background Cost control and usage regulation of medical materials (MMs) are the practical issues that the government pays close attention to. Although it is well established that there is great potential to mobilize doctors and patients in participating MMs-related clinical decisions, few interventions adopt effective measures against specific behavioral deficiencies. This study aims at developing and validating an independent consultation and feedback system (ICFS) for optimizing clinical decisions on the use of MMs for inpatients needing joint replacement surgeries. Methods Development of the research protocol is based on a problem or deficiency list derived on a trans-theoretical framework which incorporates including mainly soft systems-thinking, information asymmetry, crisis-coping, dual delegation and planned behavior. The intervention consists of two main components targeting at patients and doctors respectively. Each of the intervention ingredients is designed to tackle the doctor and patient-side problems with MMs using in joint replacement surgeries. The intervention arm receives 18 months' ICFS intervention program on the basis of the routine medical services; while the control arm, only the routine medical services. Implementation of the intervention is supported by an online platform established and maintained by the Quality Assurance Center for Medical Care in Anhui Province, a smartphone-based application program (APP) and a web-based clinical support system. Discussion The implementation of this study is expected to significantly reduce the deficiencies and moral hazards in decision-making of MMs using through the output of economic, efficient, sustainable and easy-to-promote cooperative intervention programs, thus greatly reducing medical costs and standardizing medical behaviors. Trial registration number ISRCTN10152297.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2021-03-09
    Description: Background Today an unprecedented amount of genetic sequence data is stored in publicly available repositories. For decades now, mitochondrial DNA (mtDNA) has been the workhorse of genetic studies, and as a result, there is a large volume of mtDNA data available in these repositories for a wide range of species. Indeed, whilst whole genome sequencing is an exciting prospect for the future, for most non-model organisms’ classical markers such as mtDNA remain widely used. By compiling existing data from multiple original studies, it is possible to build powerful new datasets capable of exploring many questions in ecology, evolution and conservation biology. One key question that these data can help inform is what happened in a species’ demographic past. However, compiling data in this manner is not trivial, there are many complexities associated with data extraction, data quality and data handling. Results Here we present the mtDNAcombine package, a collection of tools developed to manage some of the major decisions associated with handling multi-study sequence data with a particular focus on preparing sequence data for Bayesian skyline plot demographic reconstructions. Conclusions There is now more genetic information available than ever before and large meta-data sets offer great opportunities to explore new and exciting avenues of research. However, compiling multi-study datasets still remains a technically challenging prospect. The mtDNAcombine package provides a pipeline to streamline the process of downloading, curating, and analysing sequence data, guiding the process of compiling data sets from the online database GenBank.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2021-03-08
    Description: Background There have been few studies describing how production EMR systems can be systematically queried to identify clinically-defined populations and limited studies utilising free-text in this process. The aim of this study is to provide a generalisable methodology for constructing clinically-defined EMR-derived patient cohorts using structured and unstructured data in EMRs. Methods Patients with possible acute coronary syndrome (ACS) were used as an exemplar. Cardiologists defined clinical criteria for patients presenting with possible ACS. These were mapped to data tables within the production EMR system creating seven inclusion criteria comprised of structured data fields (orders and investigations, procedures, scanned electrocardiogram (ECG) images, and diagnostic codes) and unstructured clinical documentation. Data were extracted from two local health districts (LHD) in Sydney, Australia. Outcome measures included examination of the relative contribution of individual inclusion criteria to the identification of eligible encounters, comparisons between inclusion criterion and evaluation of consistency of data extracts across years and LHDs. Results Among 802,742 encounters in a 5 year dataset (1/1/13–30/12/17), the presence of an ECG image (54.8% of encounters) and symptoms and keywords in clinical documentation (41.4–64.0%) were used most often to identify presentations of possible ACS. Orders and investigations (27.3%) and procedures (1.4%), were less often present for identified presentations. Relevant ICD-10/SNOMED CT codes were present for 3.7% of identified encounters. Similar trends were seen when the two LHDs were examined separately, and across years. Conclusions Clinically-defined EMR-derived cohorts combining structured and unstructured data during cohort identification is a necessary prerequisite for critical validation work required for development of real-time clinical decision support and learning health systems.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2021-03-09
    Description: Background In the intensive care unit (ICU), delirium is a common, acute, confusional state associated with high risk for short- and long-term morbidity and mortality. Machine learning (ML) has promise to address research priorities and improve delirium outcomes. However, due to clinical and billing conventions, delirium is often inconsistently or incompletely labeled in electronic health record (EHR) datasets. Here, we identify clinical actions abstracted from clinical guidelines in electronic health records (EHR) data that indicate risk of delirium among intensive care unit (ICU) patients. We develop a novel prediction model to label patients with delirium based on a large data set and assess model performance. Methods EHR data on 48,451 admissions from 2001 to 2012, available through Medical Information Mart for Intensive Care-III database (MIMIC-III), was used to identify features to develop our prediction models. Five binary ML classification models (Logistic Regression; Classification and Regression Trees; Random Forests; Naïve Bayes; and Support Vector Machines) were fit and ranked by Area Under the Curve (AUC) scores. We compared our best model with two models previously proposed in the literature for goodness of fit, precision, and through biological validation. Results Our best performing model with threshold reclassification for predicting delirium was based on a multiple logistic regression using the 31 clinical actions (AUC 0.83). Our model out performed other proposed models by biological validation on clinically meaningful, delirium-associated outcomes. Conclusions Hurdles in identifying accurate labels in large-scale datasets limit clinical applications of ML in delirium. We developed a novel labeling model for delirium in the ICU using a large, public data set. By using guideline-directed clinical actions independent from risk factors, treatments, and outcomes as model predictors, our classifier could be used as a delirium label for future clinically targeted models.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2021-03-12
    Description: Background Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs. Results To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets. Conclusions A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2021-03-12
    Description: Background Recently, copy number variations (CNV) impacting genes involved in oncogenic pathways have attracted an increasing attention to manage disease susceptibility. CNV is one of the most important somatic aberrations in the genome of tumor cells. Oncogene activation and tumor suppressor gene inactivation are often attributed to copy number gain/amplification or deletion, respectively, in many cancer types and stages. Recent advances in next generation sequencing protocols allow for the addition of unique molecular identifiers (UMI) to each read. Each targeted DNA fragment is labeled with a unique random nucleotide sequence added to sequencing primers. UMI are especially useful for CNV detection by making each DNA molecule in a population of reads distinct. Results Here, we present molecular Copy Number Alteration (mCNA), a new methodology allowing the detection of copy number changes using UMI. The algorithm is composed of four main steps: the construction of UMI count matrices, the use of control samples to construct a pseudo-reference, the computation of log-ratios, the segmentation and finally the statistical inference of abnormal segmented breaks. We demonstrate the success of mCNA on a dataset of patients suffering from Diffuse Large B-cell Lymphoma and we highlight that mCNA results have a strong correlation with comparative genomic hybridization. Conclusion We provide mCNA, a new approach for CNV detection, freely available at https://gitlab.com/pierrejulien.viailly/mcna/ under MIT license. mCNA can significantly improve detection accuracy of CNV changes by using UMI.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2021-03-12
    Description: Background The identification of protein families is of outstanding practical importance for in silico protein annotation and is at the basis of several bioinformatic resources. Pfam is possibly the most well known protein family database, built in many years of work by domain experts with extensive use of manual curation. This approach is generally very accurate, but it is quite time consuming and it may suffer from a bias generated from the hand-curation itself, which is often guided by the available experimental evidence. Results We introduce a procedure that aims to identify automatically putative protein families. The procedure is based on Density Peak Clustering and uses as input only local pairwise alignments between protein sequences. In the experiment we present here, we ran the algorithm on about 4000 full-length proteins with at least one domain classified by Pfam as belonging to the Pseudouridine synthase and Archaeosine transglycosylase (PUA) clan. We obtained 71 automatically-generated sequence clusters with at least 100 members. While our clusters were largely consistent with the Pfam classification, showing good overlap with either single or multi-domain Pfam family architectures, we also observed some inconsistencies. The latter were inspected using structural and sequence based evidence, which suggested that the automatic classification captured evolutionary signals reflecting non-trivial features of protein family architectures. Based on this analysis we identified a putative novel pre-PUA domain as well as alternative boundaries for a few PUA or PUA-associated families. As a first indication that our approach was unlikely to be clan-specific, we performed the same analysis on the P53 clan, obtaining comparable results. Conclusions The clustering procedure described in this work takes advantage of the information contained in a large set of pairwise alignments and successfully identifies a set of putative families and family architectures in an unsupervised manner. Comparison with the Pfam classification highlights significant overlap and points to interesting differences, suggesting that our new algorithm could have potential in applications related to automatic protein classification. Testing this hypothesis, however, will require further experiments on large and diverse sequence datasets.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2021-03-26
    Description: Background Hidden Markov models (HMM) are a powerful tool for analyzing biological sequences in a wide variety of applications, from profiling functional protein families to identifying functional domains. The standard method used for HMM training is either by maximum likelihood using counting when sequences are labelled or by expectation maximization, such as the Baum–Welch algorithm, when sequences are unlabelled. However, increasingly there are situations where sequences are just partially labelled. In this paper, we designed a new training method based on the Baum–Welch algorithm to train HMMs for situations in which only partial labeling is available for certain biological problems. Results Compared with a similar method previously reported that is designed for the purpose of active learning in text mining, our method achieves significant improvements in model training, as demonstrated by higher accuracy when the trained models are tested for decoding with both synthetic data and real data. Conclusions A novel training method is developed to improve the training of hidden Markov models by utilizing partial labelled data. The method will impact on detecting de novo motifs and signals in biological sequence data. In particular, the method will be deployed in active learning mode to the ongoing research in detecting plasmodesmata targeting signals and assess the performance with validations from wet-lab experiments.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2021-03-26
    Description: Background Strabismus is a complex disease that has various treatment approaches each with its own advantages and drawbacks. In this context, shared decisions making (SDM) is a communication process with the provider sharing all the relevant treatment alternatives, all the benefits, and risks of each procedure, while the patient shares all the preferences and values regarding his/her choices. In that way, SDM is a bidirectional process that goes beyond the typical informed consent. Therefore, it is known a little of the extent to which SDM influences the satisfaction with the treatment outcome along with strabismus patients. To study this correlation, an SDM-Q-9 questionnaire was provided within surgical consultations where treatment decisions were made; the SDM-Q-9 aims to assess the relationship between the post-operative patient’s satisfaction and their SMD score. Methods The study is considered a prospective observational pilot study. Eligible patients were adult patients diagnosed with strabismus, who had multiple treatment options, were given at the right of choice without being driven into a physician’s preference. Ninety-three strabismus patients were asked to fill out the SDM-Q-9 questionnaire related to their perception of SDM during the entire period of strabismus treatment. After the treatment, patients were asked to rate their satisfaction level with the surgical outcome as excellent, good, fair, and poor. Descriptive statistics and the linear regression statistical tests (Spearman, Mann Whitney U, and Kriskal–Wallis) were used as analysis tools. Results The average age of the participants was 24, where 50.6% were women. The mean SDM-Q-9 score among patients was 32 (IQR = 3). The postoperative patient satisfaction was rated as being excellent by 16 (17.2%) patients, good by 38 (40.9%), fair by 32 (34.4%), and poor by 7 patients (7.5%). Data analysis by linear regression statistical tests showed a positive correlation between the SDM-Q-9 score and the patient satisfaction related to the surgery outcome (B = 0.005, p 
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2021-03-17
    Description: Background Identification of features is a critical task in microbiome studies that is complicated by the fact that microbial data are high dimensional and heterogeneous. Masked by the complexity of the data, the problem of separating signals (differential features between groups) from noise (features that are not differential between groups) becomes challenging and troublesome. For instance, when performing differential abundance tests, multiple testing adjustments tend to be overconservative, as the probability of a type I error (false positive) increases dramatically with the large numbers of hypotheses. Moreover, the grouping effect of interest can be obscured by heterogeneity. These factors can incorrectly lead to the conclusion that there are no differences in the microbiome compositions. Results We translate and represent the problem of identifying differential features, which are differential in two-group comparisons (e.g., treatment versus control), as a dynamic layout of separating the signal from its random background. More specifically, we progressively permute the grouping factor labels of the microbiome samples and perform multiple differential abundance tests in each scenario. We then compare the signal strength of the most differential features from the original data with their performance in permutations, and will observe a visually apparent decreasing trend if these features are true positives identified from the data. Simulations and applications on real data show that the proposed method creates a U-curve when plotting the number of significant features versus the proportion of mixing. The shape of the U-Curve can convey the strength of the overall association between the microbiome and the grouping factor. We also define a fragility index to measure the robustness of the discoveries. Finally, we recommend the identified features by comparing p-values in the observed data with p-values in the fully mixed data. Conclusions We have developed this into a user-friendly and efficient R-shiny tool with visualizations. By default, we use the Wilcoxon rank sum test to compute the p-values, since it is a robust nonparametric test. Our proposed method can also utilize p-values obtained from other testing methods, such as DESeq. This demonstrates the potential of the progressive permutation method to be extended to new settings.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2021-03-17
    Description: Background An increasing number of clinical trials require biomarker-driven patient stratification, especially for revolutionary immune checkpoint blockade therapy. Due to the complicated interaction between a tumor and its microenvironment, single biomarkers, such as PDL1 protein level, tumor mutational burden (TMB), single gene mutation and expression, are far from satisfactory for response prediction or patient stratification. Recently, combinatorial biomarkers were reported to be more precise and powerful for predicting therapy response and identifying potential target populations with superior survival. However, there is a lack of dedicated tools for such combinatorial biomarker analysis. Results Here, we present dualmarker, an R package designed to facilitate the data exploration for dual biomarker combinations. Given two biomarkers, dualmarker comprehensively visualizes their association with drug response and patient survival through 14 types of plots, such as boxplots, scatterplots, ROCs, and Kaplan–Meier plots. Using logistic regression and Cox regression models, dualmarker evaluated the superiority of dual markers over single markers by comparing the data fitness of dual-marker versus single-marker models, which was utilized for de novo searching for new biomarker pairs. We demonstrated this straightforward workflow and comprehensive capability by using public biomarker data from one bladder cancer patient cohort (IMvigor210 study); we confirmed the previously reported biomarker pair TMB/TGF-beta signature and CXCL13 expression/ARID1A mutation for response and survival analyses, respectively. In addition, dualmarker de novo identified new biomarker partners, for example, in overall survival modelling, the model with combination of HMGB1 expression and ARID1A mutation had statistically better goodness-of-fit than the model with either HMGB1 or ARID1A as single marker. Conclusions The dualmarker package is an open-source tool for the visualization and identification of combinatorial dual biomarkers. It streamlines the dual marker analysis flow into user-friendly functions and can be used for data exploration and hypothesis generation. Its code is freely available at GitHub at https://github.com/maxiaopeng/dualmarker under MIT license.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2021-03-19
    Description: Background Non-coding RNA (ncRNA) and protein interactions play essential roles in various physiological and pathological processes. The experimental methods used for predicting ncRNA–protein interactions are time-consuming and labor-intensive. Therefore, there is an increasing demand for computational methods to accurately and efficiently predict ncRNA–protein interactions. Results In this work, we presented an ensemble deep learning-based method, EDLMFC, to predict ncRNA–protein interactions using the combination of multi-scale features, including primary sequence features, secondary structure sequence features, and tertiary structure features. Conjoint k-mer was used to extract protein/ncRNA sequence features, integrating tertiary structure features, then fed into an ensemble deep learning model, which combined convolutional neural network (CNN) to learn dominating biological information with bi-directional long short-term memory network (BLSTM) to capture long-range dependencies among the features identified by the CNN. Compared with other state-of-the-art methods under five-fold cross-validation, EDLMFC shows the best performance with accuracy of 93.8%, 89.7%, and 86.1% on RPI1807, NPInter v2.0, and RPI488 datasets, respectively. The results of the independent test demonstrated that EDLMFC can effectively predict potential ncRNA–protein interactions from different organisms. Furtherly, EDLMFC is also shown to predict hub ncRNAs and proteins presented in ncRNA–protein networks of Mus musculus successfully. Conclusions In general, our proposed method EDLMFC improved the accuracy of ncRNA–protein interaction predictions and anticipated providing some helpful guidance on ncRNA functions research. The source code of EDLMFC and the datasets used in this work are available at https://github.com/JingjingWang-87/EDLMFC.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2021-03-20
    Description: Background Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM. Methods Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model. Results According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes. Conclusions The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2021-03-20
    Description: Background A central goal among researchers and policy makers seeking to implement clinical interventions is to identify key facilitators and barriers that contribute to implementation success. Despite calls from a number of scholars, empirical insights into the complex structural and cultural predictors of why decision aids (DAs) become routinely embedded in health care settings remains limited and highly variable across implementation contexts. Methods We examined associations between “reach”, a widely used indicator (from the RE-AIM model) of implementation success, and multi-level site characteristics of nine LVAD clinics engaged over 18 months in implementation and dissemination of a decision aid for left ventricular assist device (LVAD) treatment. Based on data collected from nurse coordinators, we explored factors at the level of the organization (e.g. patient volume), patient population (e.g. health literacy; average sickness level), clinician characteristics (e.g. attitudes towards decision aid; readiness for change) and process (how the aid was administered). We generated descriptive statistics for each site and calculated zero-order correlations (Pearson’s r) between all multi-level site variables including cumulative reach at 12 months and 18 months for all sites. We used principal components analysis (PCA) to examine any latent factors governing relationships between and among all site characteristics, including reach. Results We observed strongest inclines in reach of our decision aid across the first year, with uptake fluctuating over the second year. Average reach across sites was 63% (s.d. = 19.56) at 12 months and 66% (s.d. = 19.39) at 18 months. Our PCA revealed that site characteristics positively associated with reach on two distinct dimensions, including a first dimension reflecting greater organizational infrastructure and standardization (characteristic of larger, more established clinics) and a second dimension reflecting positive attitudinal orientations, specifically, openness and capacity to give and receive decision support among coordinators and patients. Conclusions Successful implementation plans should incorporate specific efforts to promote supportive and mutually informative interactions between clinical staff members and to institute systematic and standardized protocols to enhance the availability, convenience and salience of intervention tool in routine practice. Further research is needed to understand whether “core predictors” of success vary across different intervention types.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2021-03-21
    Description: Background Numerous studies have demonstrated that long non-coding RNAs are related to plenty of human diseases. Therefore, it is crucial to predict potential lncRNA-disease associations for disease prognosis, diagnosis and therapy. Dozens of machine learning and deep learning algorithms have been adopted to this problem, yet it is still challenging to learn efficient low-dimensional representations from high-dimensional features of lncRNAs and diseases to predict unknown lncRNA-disease associations accurately. Results We proposed an end-to-end model, VGAELDA, which integrates variational inference and graph autoencoders for lncRNA-disease associations prediction. VGAELDA contains two kinds of graph autoencoders. Variational graph autoencoders (VGAE) infer representations from features of lncRNAs and diseases respectively, while graph autoencoders propagate labels via known lncRNA-disease associations. These two kinds of autoencoders are trained alternately by adopting variational expectation maximization algorithm. The integration of both the VGAE for graph representation learning, and the alternate training via variational inference, strengthens the capability of VGAELDA to capture efficient low-dimensional representations from high-dimensional features, and hence promotes the robustness and preciseness for predicting unknown lncRNA-disease associations. Further analysis illuminates that the designed co-training framework of lncRNA and disease for VGAELDA solves a geometric matrix completion problem for capturing efficient low-dimensional representations via a deep learning approach. Conclusion Cross validations and numerical experiments illustrate that VGAELDA outperforms the current state-of-the-art methods in lncRNA-disease association prediction. Case studies indicate that VGAELDA is capable of detecting potential lncRNA-disease associations. The source code and data are available at https://github.com/zhanglabNKU/VGAELDA.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2021-03-22
    Description: Background Flow and mass cytometry are important modern immunology tools for measuring expression levels of multiple proteins on single cells. The goal is to better understand the mechanisms of responses on a single cell basis by studying differential expression of proteins. Most current data analysis tools compare expressions across many computationally discovered cell types. Our goal is to focus on just one cell type. Our narrower field of application allows us to define a more specific statistical model with easier to control statistical guarantees. Results Differential analysis of marker expressions can be difficult due to marker correlations and inter-subject heterogeneity, particularly for studies of human immunology. We address these challenges with two multiple regression strategies: a bootstrapped generalized linear model and a generalized linear mixed model. On simulated datasets, we compare the robustness towards marker correlations and heterogeneity of both strategies. For paired experiments, we find that both strategies maintain the target false discovery rate under medium correlations and that mixed models are statistically more powerful under the correct model specification. For unpaired experiments, our results indicate that much larger patient sample sizes are required to detect differences. We illustrate the R package and workflow for both strategies on a pregnancy dataset. Conclusion Our approach to finding differential proteins in flow and mass cytometry data reduces biases arising from marker correlations and safeguards against false discoveries induced by patient heterogeneity.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2021-02-06
    Description: Background An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets. Results We present IUPACpal, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats. Conclusion Within the parameters that were tested, our experimental results show that IUPACpal compares favourably to a similar application packaged with EMBOSS. We show that IUPACpal identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2021-02-08
    Description: Background Mass spectrometry imaging (MSI) is a family of acquisition techniques producing images of the distribution of molecules in a sample, without any prior tagging of the molecules. This makes it a very interesting technique for exploratory research. However, the images are difficult to analyze because the enclosed data has high dimensionality, and their content does not necessarily reflect the shape of the object of interest. Conversely, magnetic resonance imaging (MRI) scans reflect the anatomy of the tissue. MRI also provides complementary information to MSI, such as the content and distribution of water. Results We propose a new workflow to merge the information from 2D MALDI–MSI and MRI images. Our workflow can be applied to large MSI datasets in a limited amount of time. Moreover, the workflow is fully automated and based on deterministic methods which ensures the reproducibility of the results. Our methods were evaluated and compared with state-of-the-art methods. Results show that the images are combined precisely and in a time-efficient manner. Conclusion Our workflow reveals molecules which co-localize with water in biological images. It can be applied on any MSI and MRI datasets which satisfy a few conditions: same regions of the shape enclosed in the images and similar intensity distributions.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2021-02-08
    Description: Background The coronavirus disease 2019 (COVID-19) pandemic has caused health concerns worldwide since December 2019. From the beginning of infection, patients will progress through different symptom stages, such as fever, dyspnea or even death. Identifying disease progression and predicting patient outcome at an early stage helps target treatment and resource allocation. However, there is no clear COVID-19 stage definition, and few studies have addressed characterizing COVID-19 progression, making the need for this study evident. Methods We proposed a temporal deep learning method, based on a time-aware long short-term memory (T-LSTM) neural network and used an online open dataset, including blood samples of 485 patients from Wuhan, China, to train the model. Our method can grasp the dynamic relations in irregularly sampled time series, which is ignored by existing works. Specifically, our method predicted the outcome of COVID-19 patients by considering both the biomarkers and the irregular time intervals. Then, we used the patient representations, extracted from T-LSTM units, to subtype the patient stages and describe the disease progression of COVID-19. Results Using our method, the accuracy of the outcome of prediction results was more than 90% at 12 days and 98, 95 and 93% at 3, 6, and 9 days, respectively. Most importantly, we found 4 stages of COVID-19 progression with different patient statuses and mortality risks. We ranked 40 biomarkers related to disease and gave the reference values of them for each stage. Top 5 is Lymph, LDH, hs-CRP, Indirect Bilirubin, Creatinine. Besides, we have found 3 complications - myocardial injury, liver function injury and renal function injury. Predicting which of the 4 stages the patient is currently in can help doctors better assess and cure the patient. Conclusions To combat the COVID-19 epidemic, this paper aims to help clinicians better assess and treat infected patients, provide relevant researchers with potential disease progression patterns, and enable more effective use of medical resources. Our method predicted patient outcomes with high accuracy and identified a four-stage disease progression. We hope that the obtained results and patterns will aid in fighting the disease.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2021-02-10
    Description: Background Rheumatoid arthritis (RA) is an autoimmune disorder with systemic inflammation and may be induced by oxidative stress that affects an inflamed joint. Our objectives were to examine isotypes of autoantibodies against 4-hydroxy-2-nonenal (HNE) modifications in RA and associate them with increased levels of autoantibodies in RA patients. Methods Serum samples from 155 female patients [60 with RA, 35 with osteoarthritis (OA), and 60 healthy controls (HCs)] were obtained. Four novel differential HNE-modified peptide adducts, complement factor H (CFAH)1211–1230, haptoglobin (HPT)78–108, immunoglobulin (Ig) kappa chain C region (IGKC)2–19, and prothrombin (THRB)328–345, were re-analyzed using tandem mass spectrometric (MS/MS) spectra (ProteomeXchange: PXD004546) from RA patients vs. HCs. Further, we determined serum protein levels of CFAH, HPT, IGKC and THRB, HNE-protein adducts, and autoantibodies against unmodified and HNE-modified peptides. Significant correlations and odds ratios (ORs) were calculated. Results Levels of HPT in RA patients were greatly higher than the levels in HCs. Levels of HNE-protein adducts and autoantibodies in RA patients were significantly greater than those of HCs. IgM anti-HPT78−108 HNE, IgM anti-IGKC2−19, and IgM anti-IGKC2−19 HNE may be considered as diagnostic biomarkers for RA. Importantly, elevated levels of IgM anti-HPT78−108 HNE, IgM anti-IGKC2−19, and IgG anti-THRB328−345 were positively correlated with the disease activity score in 28 joints for C-reactive protein (DAS28-CRP). Further, the ORs of RA development through IgM anti-HPT78−108 HNE (OR 5.235, p 
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2021-02-06
    Description: Background Researchers developing prediction models are faced with numerous design choices that may impact model performance. One key decision is how to include patients who are lost to follow-up. In this paper we perform a large-scale empirical evaluation investigating the impact of this decision. In addition, we aim to provide guidelines for how to deal with loss to follow-up. Methods We generate a partially synthetic dataset with complete follow-up and simulate loss to follow-up based either on random selection or on selection based on comorbidity. In addition to our synthetic data study we investigate 21 real-world data prediction problems. We compare four simple strategies for developing models when using a cohort design that encounters loss to follow-up. Three strategies employ a binary classifier with data that: (1) include all patients (including those lost to follow-up), (2) exclude all patients lost to follow-up or (3) only exclude patients lost to follow-up who do not have the outcome before being lost to follow-up. The fourth strategy uses a survival model with data that include all patients. We empirically evaluate the discrimination and calibration performance. Results The partially synthetic data study results show that excluding patients who are lost to follow-up can introduce bias when loss to follow-up is common and does not occur at random. However, when loss to follow-up was completely at random, the choice of addressing it had negligible impact on model discrimination performance. Our empirical real-world data results showed that the four design choices investigated to deal with loss to follow-up resulted in comparable performance when the time-at-risk was 1-year but demonstrated differential bias when we looked into 3-year time-at-risk. Removing patients who are lost to follow-up before experiencing the outcome but keeping patients who are lost to follow-up after the outcome can bias a model and should be avoided. Conclusion Based on this study we therefore recommend (1) developing models using data that includes patients that are lost to follow-up and (2) evaluate the discrimination and calibration of models twice: on a test set including patients lost to follow-up and a test set excluding patients lost to follow-up.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2021-02-08
    Description: Following publication of the original article [1], it was reported that the contents of Additional file 2 were a duplicate of the files for Additional file 1.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2021-02-09
    Description: Background Information literacy competency is one of the requirements to implement Evidence-Based Practice (EBP) in nursing. It is necessary to pay attention to curricular development and use new educational methods such as virtual education to strengthen information literacy competency in nursing students. Given the scarcity of the studies on the effectiveness of virtual education in nursing, particularly in Iran, and the positive university atmosphere regarding the use of virtual education, this study investigated the effect of virtual education on the undergraduate nursing students’ information literacy competency for EBP. Methods This interventional study was performed with two groups of intervention and control and a pretest and posttest design. Seventy-nine nursing students were selected and assigned to the intervention or control groups by random sampling. Virtual education of the information literacy was uploaded on a website in the form of six modules delivered in four weeks. Questionnaires of demographic information and information literacy for EBP were used to collect data before and one month after the virtual education. Results The results showed no significant difference between the control and intervention groups in all dimensions of information literacy competency in the pre-test stage. In the post-test, the virtual education improved dimensions of information seeking skills (t = 3.14, p = 0.002) and knowledge about search operators (t = 39.84, p = 0.001) in the intervention groups compared with the control group. The virtual education did not have any significant effect on the use of different information resources and development of search strategy with assessing the frequency of selecting the most appropriate search statement in the intervention group. Conclusion Virtual education had a significant effect on information seeking skills and knowledge about search operators in nursing students. Nurse educators can benefit from our experiences in designing this method for the use of virtual education programs in nursing schools. Given the lack of effectiveness of this program in using different information resources and development of search strategy, nurse educators are recommended to train information literacy for EBP by integrating several approaches such as virtual (online and offline) and face-to-face education.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2021-02-09
    Description: Background Long noncoding RNAs represent a large class of transcripts with two common features: they exceed an arbitrary length threshold of 200 nt and are assumed to not encode proteins. Although a growing body of evidence indicates that the vast majority of lncRNAs are potentially nonfunctional, hundreds of them have already been revealed to perform essential gene regulatory functions or to be linked to a number of cellular processes, including those associated with the etiology of human diseases. To better understand the biology of lncRNAs, it is essential to perform a more in-depth study of their evolution. In contrast to protein-encoding transcripts, however, they do not show the strong sequence conservation that usually results from purifying selection; therefore, software that is typically used to resolve the evolutionary relationships of protein-encoding genes and transcripts is not applicable to the study of lncRNAs. Results To tackle this issue, we developed lncEvo, a computational pipeline that consists of three modules: (1) transcriptome assembly from RNA-Seq data, (2) prediction of lncRNAs, and (3) conservation study—a genome-wide comparison of lncRNA transcriptomes between two species of interest, including search for orthologs. Importantly, one can choose to apply lncEvo solely for transcriptome assembly or lncRNA prediction, without calling the conservation-related part. Conclusions lncEvo is an all-in-one tool built with the Nextflow framework, utilizing state-of-the-art software and algorithms with customizable trade-offs between speed and sensitivity, ease of use and built-in reporting functionalities. The source code of the pipeline is freely available for academic and nonacademic use under the MIT license at https://gitlab.com/spirit678/lncrna_conservation_nf.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2021-02-09
    Description: Background U.S. hospitals and dialysis centers are penalized for 30-day hospital readmissions of dialysis patients, despite little infrastructure to facilitate care transitions between these settings. We are developing a third-party web-based information exchange platform, DialysisConnect, to enable clinicians to view and exchange information about dialysis patients during admission, hospitalization, and discharge. This health information technology solution could serve as a flexible and relatively affordable solution for dialysis facilities and hospitals across the nation who are seeking to serve as true partners in the improved care of dialysis patients. The purpose of this study was to evaluate the perceived coherence of DialysisConnect to key clinical stakeholders, to prepare messaging for implementation. Methods As part of a hybrid effectiveness-implementation study guided by Normalization Process Theory, we collected data on stakeholder perceptions of continuity of care for patients receiving maintenance dialysis and a DialysisConnect prototype before completing development and piloting the system. We conducted four focus groups with stakeholders from one academic hospital and associated dialysis centers [hospitalists (n = 5), hospital staff (social workers, nurses, pharmacists; n = 9), nephrologists (n = 7), and dialysis clinic staff (social workers, nurses; n = 10)]. Transcriptions were analyzed thematically within each component of the construct of coherence (differentiation, communal specification, individual specification, and internalization). Results Participants differentiated DialysisConnect from usual care variously as an information dashboard, a quick-exchange communication channel, and improved discharge information delivery; some could not differentiate it in terms of workflow. The purpose of DialysisConnect (communal specification) was viewed as fully coherent only for communicating outside of the same healthcare system. Current system workarounds were acknowledged as deterrents for practice change. All groups delegated DialysisConnect tasks (individual specification) to personnel besides themselves. Partial internalization of DialysisConnect was achieved only by dialysis clinic staff, based on experience with similar technology. Conclusions Implementing DialysisConnect for clinical users in both settings will require presenting a composite picture of current communication processes from all stakeholder groups to correct single-group misunderstandings, as well as providing data about care transitions communication beyond the local context to ease resistance to practice change.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2021-02-06
    Description: Background Researchers and policy makers have long suspected that people have differing, and potentially nefarious, motivations for participating in stated-preference studies such as discrete-choice experiments (DCE). While anecdotes and theories exist on why people participate in surveys, there is a paucity of evidence exploring variation in preferences for participating in stated-preference studies. Methods We used a DCE to estimate preferences for participating in preference research among an online survey panel sample. Preferences for the characteristics of a study to be conducted at a local hospital were assessed across five attributes (validity, relevance, bias, burden, time and payment) and described across three levels using a starring system. A D-efficient experimental design was used to construct three blocks of 12 choice tasks with two profiles each. Respondents were also asked about factors that motivated their choices. Mixed logistic regression was used to analyze the aggregate sample and latent class analysis identified segments of respondents. Results 629 respondents completed the experiment. In aggregate “study validity” was most important. Latent class results identified two segments based on underlying motivations: a quality-focused segment (76%) who focused most on validity, relevance, and bias and a convenience-focused segment (24%) who focused most on reimbursement and time. Quality-focused respondents spent more time completing the survey (p 
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2021-02-08
    Description: Background Biological phenomena usually evolves over time and recent advances in high-throughput microscopy have made possible to collect multiple 3D images over time, generating $$3D+t$$ 3 D + t (or 4D) datasets. To extract useful information there is the need to extract spatial and temporal data on the particles that are in the images, but particle tracking and feature extraction need some kind of assistance. Results This manuscript introduces our new freely downloadable toolbox, the Visual4DTracker. It is a MATLAB package implementing several useful functionalities to navigate, analyse and proof-read the track of each particle detected in any $$3D+t$$ 3 D + t stack. Furthermore, it allows users to proof-read and to evaluate the traces with respect to a given gold standard. The Visual4DTracker toolbox permits the users to visualize and save all the generated results through a user-friendly graphical user interface. This tool has been successfully used in three applicative examples. The first processes synthetic data to show all the software functionalities. The second shows how to process a 4D image stack showing the time-lapse growth of Drosophila cells in an embryo. The third example presents the quantitative analysis of insulin granules in living beta-cells, showing that such particles have two main dynamics that coexist inside the cells. Conclusions Visual4DTracker is a software package for MATLAB to visualize, handle and manually track $$3D+t$$ 3 D + t stacks of microscopy images containing objects such cells, granules, etc.. With its unique set of functions, it remarkably permits the user to analyze and proof-read 4D data in a friendly 3D fashion. The tool is freely available at https://drive.google.com/drive/folders/19AEn0TqP-2B8Z10kOavEAopTUxsKUV73?usp=sharing
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2021-02-09
    Description: Background Hub transcription factors, regulating many target genes in gene regulatory networks (GRNs), play important roles as disease regulators and potential drug targets. However, while numerous methods have been developed to predict individual regulator-gene interactions from gene expression data, few methods focus on inferring these hubs. Results We have developed ComHub, a tool to predict hubs in GRNs. ComHub makes a community prediction of hubs by averaging over predictions by a compendium of network inference methods. Benchmarking ComHub against the DREAM5 challenge data and two independent gene expression datasets showed a robust performance of ComHub over all datasets. Conclusions In contrast to other evaluated methods, ComHub consistently scored among the top performing methods on data from different sources. Lastly, we implemented ComHub to work with both predefined networks and to perform stand-alone network inference, which will make the method generally applicable.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2021-02-09
    Description: Background Current high-throughput technologies—i.e. whole genome sequencing, RNA-Seq, ChIP-Seq, etc.—generate huge amounts of data and their usage gets more widespread with each passing year. Complex analysis pipelines involving several computationally-intensive steps have to be applied on an increasing number of samples. Workflow management systems allow parallelization and a more efficient usage of computational power. Nevertheless, this mostly happens by assigning the available cores to a single or few samples’ pipeline at a time. We refer to this approach as naive parallel strategy (NPS). Here, we discuss an alternative approach, which we refer to as concurrent execution strategy (CES), which equally distributes the available processors across every sample’s pipeline. Results Theoretically, we show that the CES results, under loose conditions, in a substantial speedup, with an ideal gain range spanning from 1 to the number of samples. Also, we observe that the CES yields even faster executions since parallelly computable tasks scale sub-linearly. Practically, we tested both strategies on a whole exome sequencing pipeline applied to three publicly available matched tumour-normal sample pairs of gastrointestinal stromal tumour. The CES achieved speedups in latency up to 2–2.4 compared to the NPS. Conclusions Our results hint that if resources distribution is further tailored to fit specific situations, an even greater gain in performance of multiple samples pipelines execution could be achieved. For this to be feasible, a benchmarking of the tools included in the pipeline would be necessary. It is our opinion these benchmarks should be consistently performed by the tools’ developers. Finally, these results suggest that concurrent strategies might also lead to energy and cost savings by making feasible the usage of low power machine clusters.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2021-02-08
    Description: Background The high variability in envelope regions of some viruses such as HIV allow the virus to establish infection and to escape subsequent immune surveillance. This variability, as well as increasing incorporation of N-linked glycosylation sites, is fundamental to this evasion. It also creates difficulties for multiple sequence alignment methods (MSA) that provide the first step in their analysis. Existing MSA tools often fail to properly align highly variable HIV envelope sequences requiring extensive manual editing that is impractical with even a moderate number of these variable sequences. Results We developed an automated library building tool NGlyAlign, that organizes similar N-linked glycosylation sites as block constraints and statistically conserved global sites as single site constraints to automatically enforce partial columns in consistency-based MSA methods such as Dialign. This combined method accurately aligns variable HIV-1 envelope sequences. We tested the method on two datasets: a set of 156 founder and chronic gp160 HIV-1 subtype B sequences as well as a set of reference sequences of gp120 in the highly variable region 1. On measures such as entropy scores, sum of pair scores, column score, and similarity heat maps, NGlyAlign+Dialign proved superior against methods such as T-Coffee, ClustalOmega, ClustalW, Praline, HIValign and Muscle. The method is scalable to large sequence sets producing accurate alignments without requiring manual editing. As well as this application to HIV, our method can be used for other highly variable glycoproteins such as hepatitis C virus envelope. Conclusions NGlyAlign is an automated tool for mapping and building glycosylation motif libraries to accurately align highly variable regions in HIV sequences. It can provide the basis for many studies reliant on single robust alignments. NGlyAlign has been developed as an open-source tool and is freely available at https://github.com/UNSW-Mathematical-Biology/NGlyAlign_v1.0 .
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2021-02-08
    Description: Background Identification and selection of protein particles in cryo-electron micrographs is an important step in single particle analysis. In this study, we developed a deep learning-based particle picking network to automatically detect particle centers from cryoEM micrographs. This is a challenging task due to the nature of cryoEM data, having low signal-to-noise ratios with variable particle sizes, shapes, distributions, grayscale variations as well as other undesirable artifacts. Results We propose a double convolutional neural network (CNN) cascade for automated detection of particles in cryo-electron micrographs. This approach, entitled Deep Regression Picker Network or “DRPnet”, is simple but very effective in recognizing different particle sizes, shapes, distributions and grayscale patterns corresponding to 2D views of 3D particles. Particles are detected by the first network, a fully convolutional regression network (FCRN), which maps the particle image to a continuous distance map that acts like a probability density function of particle centers. Particles identified by FCRN are further refined to reduce false particle detections by the second classification CNN. DRPnet’s first CNN pretrained with only a single cryoEM dataset can be used to detect particles from different datasets without retraining. Compared to RELION template-based autopicking, DRPnet results in better particle picking performance with drastically reduced user interactions and processing time. DRPnet also outperforms the state-of-the-art particle picking networks in terms of the supervised detection evaluation metrics recall, precision, and F-measure. To further highlight quality of the picked particle sets, we compute and present additional performance metrics assessing the resulting 3D reconstructions such as number of 2D class averages, efficiency/angular coverage, Rosenthal-Henderson plots and local/global 3D reconstruction resolution. Conclusion DRPnet shows greatly improved time-savings to generate an initial particle dataset compared to manual picking, followed by template-based autopicking. Compared to other networks, DRPnet has equivalent or better performance. DRPnet excels on cryoEM datasets that have low contrast or clumped particles. Evaluating other performance metrics, DRPnet is useful for higher resolution 3D reconstructions with decreased particle numbers or unknown symmetry, detecting particles with better angular orientation coverage.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2021-02-08
    Description: Background Drug repositioning refers to the identification of new indications for existing drugs. Drug-based inference methods for drug repositioning apply some unique features of drugs for new indication prediction. Complementary information is provided by these different features. It is therefore necessary to integrate these features for more accurate in silico drug repositioning. Results In this study, we collect 3 different types of drug features (i.e., chemical, genomic and pharmacological spaces) from public databases. Similarities between drugs are separately calculated based on each of the features. We further develop a fusion method to combine the 3 similarity measurements. We test the inference abilities of the 4 similarity datasets in drug repositioning under the guilt-by-association principle. Leave-one-out cross-validations show the integrated similarity measurement IntegratedSim receives the best prediction performance, with the highest AUC value of 0.8451 and the highest AUPR value of 0.2201. Case studies demonstrate IntegratedSim produces the largest numbers of confirmed predictions in most cases. Moreover, we compare our integration method with 3 other similarity-fusion methods using the datasets in our study. Cross-validation results suggest our method improves the prediction accuracy in terms of AUC and AUPR values. Conclusions Our study suggests that the 3 drug features used in our manuscript are valuable information for drug repositioning. The comparative results indicate that integration of the 3 drug features would improve drug-disease association prediction. Our study provides a strategy for the fusion of different drug features for in silico drug repositioning.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2021-02-10
    Description: Background This study aims to explore the information chain management model of large instrument and equipment inter-working in the operating room (OR) led by information nurses. Methods Through the chain management process of large instruments and equipment in the OR, which was based on information nurses, the management model of inter-working and integrating information chain was established, the key links were controlled, and the whole life cycle management of instruments and equipment from expected procurement to scrapping treatment was realized. Using the cluster sampling method, 1562 surgical patients were selected. Among these patients, 749 patients were assigned to the control group before the running mode, and 813 patients were assigned to the observation group after the running mode. The related indexes for large instrument and equipment management in the department before and after the running mode were compared. Results In the observation group, the average time of equipment registration was (22.05 ± 2.36), the cost was reduced by 2220 yuan/year, and the satisfaction rate of the nursing staff was 97.62%. These were significantly better, when compared to the control group (P 
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2021-02-18
    Description: Background Social networking sites such as Facebook® can contribute to health promotion and behaviour change activities, but are currently underused for this purpose. In Germany, health insurance companies are relevant public health agencies that are responsible for health promotion, primary prevention, and health education. We intended to analyse the Facebook® accounts of health insurance providers to explore the range of prevention topics addressed, identify the communication formats used, and analyse user activity stimulated by prevention-related posts. Methods We performed a quantitative content analysis of text and picture data on Facebook® accounts (9 months in retrospect) in a cross-sectional study design. 64/159 German health insurance providers hosted a Facebook® page, 25/64 posted ≥ 10 posts/months. Among those 25, we selected 17 health insurance companies (12 public, 5 private) for analysis. All posts were categorized according to domains in the classification system that was developed for this study, and the number of likes and comments was counted. The data were analysed using descriptive statistics. Results We collected 3,763 Facebook® posts, 32% of which had a focus on prevention. The frequency of prevention-related posts varied among health insurance providers (1–25 per month). The behaviours addressed most frequently were healthy nutrition, physical activity, and stress/anxiety relief, often in combination with each other. All these topics yielded a moderate user engagement (30–120 likes, 2–10 comments per post). User engagement was highest when a competition or quiz were posted (11% of posts). The predominant communication pattern was health education, often supplemented by photos or links, or information about offline events (e.g. a public run). Some providers regularly engaged in two-side communication with users, inviting tips, stories or recipes, or responding to individual comments. Still, the interactive potential offered by Facebook® was only partly exploited. Conclusions Those few health insurace companies that regularly post content about prevention or healthy lifestyles on their Facebook® accounts comply with suggestions given for social media communication. Still, many health insurance providers fail to actively interact with wider audiences. Whether health communication on Facebook® can actually increase health literacy and lead to behaviour changes still needs to be evaluated.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2021-02-18
    Description: Background Systemic inflammatory response syndrome (SIRS) is defined as a non-specific inflammatory process in the absence of infection. SIRS increases susceptibility for organ dysfunction, and frequently affects the clinical outcome of affected patients. We evaluated a knowledge-based, interoperable clinical decision-support system (CDSS) for SIRS detection on a pediatric intensive care unit (PICU). Methods The CDSS developed retrieves routine data, previously transformed into an interoperable format, by using model-based queries and guideline- and knowledge-based rules. We evaluated the CDSS in a prospective diagnostic study from 08/2018–03/2019. 168 patients from a pediatric intensive care unit of a tertiary university hospital, aged 0 to 18 years, were assessed for SIRS by the CDSS and by physicians during clinical routine. Sensitivity and specificity (when compared to the reference standard) with 95% Wald confidence intervals (CI) were estimated on the level of patients and patient-days. Results Sensitivity and specificity was 91.7% (95% CI 85.5–95.4%) and 54.1% (95% CI 45.4–62.5%) on patient level, and 97.5% (95% CI 95.1–98.7%) and 91.5% (95% CI 89.3–93.3%) on the level of patient-days. Physicians’ SIRS recognition during clinical routine was considerably less accurate (sensitivity of 62.0% (95% CI 56.8–66.9%)/specificity of 83.3% (95% CI 80.4–85.9%)) when measurd on the level of patient-days. Evaluation revealed valuable insights for the general design of the CDSS as well as specific rule modifications. Despite a lower than expected specificity, diagnostic accuracy was higher than the one in daily routine ratings, thus, demonstrating high potentials of using our CDSS to help to detect SIRS in clinical routine. Conclusions We successfully evaluated an interoperable CDSS for SIRS detection in PICU. Our study demonstrated the general feasibility and potentials of the implemented algorithms but also some limitations. In the next step, the CDSS will be optimized to overcome these limitations and will be evaluated in a multi-center study. Trial registration: NCT03661450 (ClinicalTrials.gov); registered September 7, 2018.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2021-02-18
    Description: Background One component of precision medicine is to construct prediction models with their predicitve ability as high as possible, e.g. to enable individual risk prediction. In genetic epidemiology, complex diseases like coronary artery disease, rheumatoid arthritis, and type 2 diabetes, have a polygenic basis and a common assumption is that biological and genetic features affect the outcome under consideration via interactions. In the case of omics data, the use of standard approaches such as generalized linear models may be suboptimal and machine learning methods are appealing to make individual predictions. However, most of these algorithms focus mostly on main or marginal effects of the single features in a dataset. On the other hand, the detection of interacting features is an active area of research in the realm of genetic epidemiology. One big class of algorithms to detect interacting features is based on the multifactor dimensionality reduction (MDR). Here, we further develop the model-based MDR (MB-MDR), a powerful extension of the original MDR algorithm, to enable interaction empowered individual prediction. Results Using a comprehensive simulation study we show that our new algorithm (median AUC: 0.66) can use information hidden in interactions and outperforms two other state-of-the-art algorithms, namely the Random Forest (median AUC: 0.54) and Elastic Net (median AUC: 0.50), if interactions are present in a scenario of two pairs of two features having small effects. The performance of these algorithms is comparable if no interactions are present. Further, we show that our new algorithm is applicable to real data by comparing the performance of the three algorithms on a dataset of rheumatoid arthritis cases and healthy controls. As our new algorithm is not only applicable to biological/genetic data but to all datasets with discrete features, it may have practical implications in other research fields where interactions between features have to be considered as well, and we made our method available as an R package (https://github.com/imbs-hl/MBMDRClassifieR). Conclusions The explicit use of interactions between features can improve the prediction performance and thus should be included in further attempts to move precision medicine forward.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2021-02-24
    Description: Background Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples. Results The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome. Conclusions We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2021-02-25
    Description: Background Microbes perform a fundamental economic, social, and environmental role in our society. Metagenomics makes it possible to investigate microbes in their natural environments (the complex communities) and their interactions. The way they act is usually estimated by looking at the functions they play in those environments and their responsibility is measured by their genes. The advances of next-generation sequencing technology have facilitated metagenomics research however it also creates a heavy computational burden. Large and complex biological datasets are available as never before. There are many gene predictors available that can aid the gene annotation process though they lack handling appropriately metagenomic data complexities. There is no standard metagenomic benchmark data for gene prediction. Thus, gene predictors may inflate their results by obfuscating low false discovery rates. Results We introduce geneRFinder, an ML-based gene predictor able to outperform state-of-the-art gene prediction tools across this benchmark by using only one pre-trained Random Forest model. Average prediction rates of geneRFinder differed in percentage terms by 54% and 64%, respectively, against Prodigal and FragGeneScan while handling high complexity metagenomes. The specificity rate of geneRFinder had the largest distance against FragGeneScan, 79 percentage points, and 66 more than Prodigal. According to McNemar’s test, all percentual differences between predictors performances are statistically significant for all datasets with a 99% confidence interval. Conclusions We provide geneRFinder, an approach for gene prediction in distinct metagenomic complexities, available at gitlab.com/r.lorenna/generfinder and https://osf.io/w2yd6/, and also we provide a novel, comprehensive benchmark data for gene prediction—which is based on The Critical Assessment of Metagenome Interpretation (CAMI) challenge, and contains labeled data from gene regions—available at https://sourceforge.net/p/generfinder-benchmark.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2021-02-25
    Description: Background The Ministry of Health of Malaysia has invested significant resources to implement an electronic health record (EHR) system to ensure the full automation of hospitals for coordinated care delivery. Thus, evaluating whether the system has been effectively utilized is necessary, particularly regarding how it predicts the post-implementation primary care providers’ performance impact. Methods Convenience sampling was employed for data collection in three government hospitals for 7 months. A standardized effectiveness survey for EHR systems was administered to primary health care providers (specialists, medical officers, and nurses) as they participated in medical education programs. Empirical data were assessed by employing partial least squares-structural equation modeling for hypothesis testing. Results The results demonstrated that knowledge quality had the highest score for predicting performance and had a large effect size, whereas system compatibility was the most substantial system quality component. The findings indicated that EHR systems supported the clinical tasks and workflows of care providers, which increased system quality, whereas the increased quality of knowledge improved user performance. Conclusion Given these findings, knowledge quality and effective use should be incorporated into evaluating EHR system effectiveness in health institutions. Data mining features can be integrated into current systems for efficiently and systematically generating health populations and disease trend analysis, improving clinical knowledge of care providers, and increasing their productivity. The validated survey instrument can be further tested with empirical surveys in other public and private hospitals with different interoperable EHR systems.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2021-02-18
    Description: Background The competing endogenous RNA (ceRNA) regulation is a newly discovered post-transcriptional regulation mechanism and plays significant roles in physiological and pathological progress. CeRNA networks provide global views to help understand the regulation of ceRNAs. CeRNA networks have been widely used to detect survival biomarkers, select candidate regulators of disease genes, and predict long noncoding RNA functions. However, there is no software platform to provide overall functions from the construction to analysis of ceRNA networks. Results To fill this gap, we introduce CeNet Omnibus, an R/Shiny application, which provides a unified framework for the construction and analysis of ceRNA network. CeNet Omnibus enables users to select multiple measurements, such as Pearson correlation coefficient (PCC), mutual information (MI), and liquid association (LA), to identify ceRNA pairs and construct ceRNA networks. Furthermore, CeNet Omnibus provides a one-stop solution to analyze the topological properties of ceRNA networks, detect modules, and perform gene enrichment analysis and survival analysis. CeNet Omnibus intends to cover comprehensiveness, high efficiency, high expandability, and user customizability, and it also offers a web-based user-friendly interface to users to obtain the output intuitionally. Conclusion CeNet Omnibus is a comprehensive platform for the construction and analysis of ceRNA networks. It is highly customizable and outputs the results in intuitive and interactive. We expect that CeNet Omnibus will assist researchers to understand the property of ceRNA networks and associated biological phenomena. CeNet Omnibus is an R/Shiny application based on the Shiny framework developed by RStudio. The R package and detailed tutorial are available on our GitHub page with the URL https://github.com/GaoLabXDU/CeNetOmnibus.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2021-02-17
    Description: Background Bioimaging techniques offer a robust tool for studying molecular pathways and morphological phenotypes of cell populations subjected to various conditions. As modern high-resolution 3D microscopy provides access to an ever-increasing amount of high-quality images, there arises a need for their analysis in an automated, unbiased, and simple way. Segmentation of structures within the cell nucleus, which is the focus of this paper, presents a new layer of complexity in the form of dense packing and significant signal overlap. At the same time, the available segmentation tools provide a steep learning curve for new users with a limited technical background. This is especially apparent in the bulk processing of image sets, which requires the use of some form of programming notation. Results In this paper, we present PartSeg, a tool for segmentation and reconstruction of 3D microscopy images, optimised for the study of the cell nucleus. PartSeg integrates refined versions of several state-of-the-art algorithms, including a new multi-scale approach for segmentation and quantitative analysis of 3D microscopy images. The features and user-friendly interface of PartSeg were carefully planned with biologists in mind, based on analysis of multiple use cases and difficulties encountered with other tools, to offer an ergonomic interface with a minimal entry barrier. Bulk processing in an ad-hoc manner is possible without the need for programmer support. As the size of datasets of interest grows, such bulk processing solutions become essential for proper statistical analysis of results. Advanced users can use PartSeg components as a library within Python data processing and visualisation pipelines, for example within Jupyter notebooks. The tool is extensible so that new functionality and algorithms can be added by the use of plugins. For biologists, the utility of PartSeg is presented in several scenarios, showing the quantitative analysis of nuclear structures. Conclusions In this paper, we have presented PartSeg which is a tool for precise and verifiable segmentation and reconstruction of 3D microscopy images. PartSeg is optimised for cell nucleus analysis and offers multi-scale segmentation algorithms best-suited for this task. PartSeg can also be used for the bulk processing of multiple images and its components can be reused in other systems or computational experiments.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2021-02-03
    Description: Background Differential expression and feature selection analyses are essential steps for the development of accurate diagnostic/prognostic classifiers of complicated human diseases using transcriptomics data. These steps are particularly challenging due to the curse of dimensionality and the presence of technical and biological noise. A promising strategy for overcoming these challenges is the incorporation of pre-existing transcriptomics data in the identification of differentially expressed (DE) genes. This approach has the potential to improve the quality of selected genes, increase classification performance, and enhance biological interpretability. While a number of methods have been developed that use pre-existing data for differential expression analysis, existing methods do not leverage the identities of experimental conditions to create a robust metric for identifying DE genes. Results In this study, we propose a novel differential expression and feature selection method—GEOlimma—which combines pre-existing microarray data from the Gene Expression Omnibus (GEO) with the widely-applied Limma method for differential expression analysis. We first quantify differential gene expression across 2481 pairwise comparisons from 602 curated GEO Datasets, and we convert differential expression frequencies to DE prior probabilities. Genes with high DE prior probabilities show enrichment in cell growth and death, signal transduction, and cancer-related biological pathways, while genes with low prior probabilities were enriched in sensory system pathways. We then applied GEOlimma to four differential expression comparisons within two human disease datasets and performed differential expression, feature selection, and supervised classification analyses. Our results suggest that use of GEOlimma provides greater experimental power to detect DE genes compared to Limma, due to its increased effective sample size. Furthermore, in a supervised classification analysis using GEOlimma as a feature selection method, we observed similar or better classification performance than Limma given small, noisy subsets of an asthma dataset. Conclusions Our results demonstrate that GEOlimma is a more effective method for differential gene expression and feature selection analyses compared to the standard Limma method. Due to its focus on gene-level differential expression, GEOlimma also has the potential to be applied to other high-throughput biological datasets.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2021-02-17
    Description: Background We know little about the best approaches to design training for healthcare professionals. We thus studied how user-centered and theory-based design contribute to the development of a distance learning program for professionals, to increase their shared decision-making (SDM) with older adults living with neurocognitive disorders and their caregivers. Methods In this mixed-methods study, healthcare professionals who worked in family medicine clinics and homecare services evaluated a training program in a user-centered approach with several iterative phases of quantitative and qualitative evaluation, each followed by modifications. The program comprised an e-learning activity and five evidence summaries. A subsample assessed the e-learning activity during semi-structured think-aloud sessions. A second subsample assessed the evidence summaries they received by email. All participants completed a theory-based questionnaire to assess their intention to adopt SDM. Descriptive statistical analyses and qualitative thematic analyses were integrated at each round to prioritize training improvements with regard to the determinants most likely to influence participants’ intention. Results Of 106 participants, 98 completed their evaluations of either the e-learning activity or evidence summary (93%). The professions most represented were physicians (60%) and nurses (15%). Professionals valued the e-learning component to gain knowledge on the theory and practice of SDM, and the evidence summaries to apply the knowledge gained through the e-learning activity to diverse clinical contexts. The iterative design process allowed addressing most weaknesses reported. Participants’ intentions to adopt SDM and to use the summaries were high at baseline and remained positive as the rounds progressed. Attitude and social influence significantly influenced participants' intention to use the evidence summaries (P 
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2021-02-17
    Description: Background Summative eHealth evaluations frequently lack quality, which affects the generalizability of the evidence, and its use in practice and further research. To guarantee quality, a number of activities are recommended in the guidelines for evaluation planning. This study aimed to examine a case of an eHealth evaluation planning in a multi-national and interdisciplinary setting and to provide recommendations for eHealth evaluation planning guidelines. Methods An empirical eHealth evaluation process was developed through a case study. The empirical process was compared with selected guidelines for eHealth evaluation planning using a pattern-matching technique. Results Planning in the interdisciplinary and multi-national team demanded extensive negotiation and alignment to support the future use of the evidence created. The evaluation planning guidelines did not provide specific strategies for different set-ups of the evaluation teams. Further, they did not address important aspects of quality evaluation, such as feasibility analysis of the outcome measures and data collection, monitoring of data quality, and consideration of the methods and measures employed in similar evaluations. Conclusions Activities to prevent quality problems need to be incorporated in the guidelines for evaluation planning. Additionally, evaluators could benefit from guidance in evaluation planning related to the different set-ups of the evaluation teams.
    Electronic ISSN: 1472-6947
    Topics: Computer Science , Medicine
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2021-02-01
    Description: Background The drive to understand how microbial communities interact with their environments has inspired innovations across many fields. The data generated from sequence-based analyses of microbial communities typically are of high dimensionality and can involve multiple data tables consisting of taxonomic or functional gene/pathway counts. Merging multiple high dimensional tables with study-related metadata can be challenging. Existing microbiome pipelines available in R have created their own data structures to manage this problem. However, these data structures may be unfamiliar to analysts new to microbiome data or R and do not allow for deviations from internal workflows. Existing analysis tools also focus primarily on community-level analyses and exploratory visualizations, as opposed to analyses of individual taxa. Results We developed the R package “tidyMicro” to serve as a more complete microbiome analysis pipeline. This open source software provides all of the essential tools available in other popular packages (e.g., management of sequence count tables, standard exploratory visualizations, and diversity inference tools) supplemented with multiple options for regression modelling (e.g., negative binomial, beta binomial, and/or rank based testing) and novel visualizations to improve interpretability (e.g., Rocky Mountain plots, longitudinal ordination plots). This comprehensive pipeline for microbiome analysis also maintains data structures familiar to R users to improve analysts’ control over workflow. A complete vignette is provided to aid new users in analysis workflow. Conclusions tidyMicro provides a reliable alternative to popular microbiome analysis packages in R. We provide standard tools as well as novel extensions on standard analyses to improve interpretability results while maintaining object malleability to encourage open source collaboration. The simple examples and full workflow from the package are reproducible and applicable to external data sets.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2021-02-10
    Description: Background We present here a computational shortcut to improve a powerful wavelet-based method by Shim and Stephens (Ann Appl Stat 9(2):665–686, 2015. 10.1214/14-AOAS776) called WaveQTL that was originally designed to identify DNase I hypersensitivity quantitative trait loci (dsQTL). Results WaveQTL relies on permutations to evaluate the significance of an association. We applied a recent method by Zhou and Guan (J Am Stat Assoc 113(523):1362–1371, 2017. 10.1080/01621459.2017.1328361) to boost computational speed, which involves calculating the distribution of Bayes factors and estimating the significance of an association by simulations rather than permutations. We called this simulation-based approach “fast functional wavelet” (FFW), and tested it on a publicly available DNA methylation (DNAm) dataset on colorectal cancer. The simulations confirmed a substantial gain in computational speed compared to the permutation-based approach in WaveQTL. Furthermore, we show that FFW controls the type I error satisfactorily and has good power for detecting differentially methylated regions. Conclusions Our approach has broad utility and can be applied to detect associations between different types of functions and phenotypes. As more and more DNAm datasets are being made available through public repositories, an attractive application of FFW would be to re-analyze these data and identify associations that might have been missed by previous efforts. The full R package for FFW is freely available at GitHub https://github.com/william-denault/ffw.
    Electronic ISSN: 1471-2105
    Topics: Biology , Computer Science
    Published by BioMed Central
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...