ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Ihre E-Mail wurde erfolgreich gesendet. Bitte prüfen Sie Ihren Maileingang.

Leider ist ein Fehler beim E-Mail-Versand aufgetreten. Bitte versuchen Sie es erneut.

Vorgang fortführen?

Exportieren
Filter
  • Artikel  (4.536)
  • Oxford University Press  (4.536)
  • 2020-2022  (2.022)
  • 1990-1994  (1.587)
  • 1985-1989  (927)
  • 1950-1954
  • Informatik  (4.536)
Sammlung
  • Artikel  (4.536)
Erscheinungszeitraum
Jahr
Zeitschrift
  • 1
    Publikationsdatum: 2021-08-20
    Beschreibung: Motivation Accurate automatic annotation of protein function relies on both innovative models and robust data sets. Due to their importance in biological processes, the identification of DNA-binding proteins directly from protein sequence has been the focus of many studies. However, the data sets used to train and evaluate these methods have suffered from substantial flaws. We describe some of the weaknesses of the data sets used in previous DNA-binding protein literature and provide several new data sets addressing these problems. We suggest new evaluative benchmark tasks that more realistically assess real-world performance for protein annotation models. We propose a simple new model for the prediction of DNA-binding proteins and compare its performance on the improved data sets to two previously published models. Additionally, we provide extensive tests showing how the best models predict across taxonomies. Results Our new gradient boosting model, which uses features derived from a published protein language model, outperforms the earlier models. Perhaps surprisingly, so does a baseline nearest neighbor model using BLAST percent identity. We evaluate the sensitivity of these models to perturbations of DNA-binding regions and control regions of protein sequences. The successful data-driven models learn to focus on DNA-binding regions. When predicting across taxonomies, the best models are highly accurate across species in the same kingdom and can provide some information when predicting across kingdoms. Code and Data Availability The data and results for this paper can be found at https://doi.org/10.5281/zenodo.5153906. The code for this paper can be found at https://doi.org/10.5281/zenodo.5153683. The code, data and results can also be found at https://github.com/AZaitzeff/tools_for_dna_binding_proteins.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 2
    Publikationsdatum: 2021-08-17
    Beschreibung: Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high-throughput screening studies, comprising 64 200 unique combinations of 4153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal molecular representation type, it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 3
    Publikationsdatum: 2021-08-06
    Beschreibung: Motivation The investigation of quantitative trait loci (QTL) is an essential component in our understanding of how organisms vary phenotypically. However, many important crop species are polyploid (carrying more than two copies of each chromosome), requiring specialized tools for such analyses. Moreover, deciphering meiotic processes at higher ploidy levels is not straightforward, but is necessary to understand the reproductive dynamics of these species, or uncover potential barriers to their genetic improvement. Results Here, we present polyqtlR, a novel software tool to facilitate such analyses in (auto)polyploid crops. It performs QTL interval mapping in F1 populations of outcrossing polyploids of any ploidy level using identity-by-descent probabilities. The allelic composition of discovered QTL can be explored, enabling favourable alleles to be identified and tracked in the population. Visualization tools within the package facilitate this process, and options to include genetic co-factors and experimental factors are included. Detailed information on polyploid meiosis including prediction of multivalent pairing structures, detection of preferential chromosomal pairing and location of double reduction events can be performed. Availabilityand implementation polyqtlR is freely available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/package=polyqtlR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 4
    Publikationsdatum: 2021-08-20
    Beschreibung: Circular RNAs (circRNAs) are widely expressed in highly diverged eukaryotes. Although circRNAs have been known for many years, their function remains unclear. Interaction with RNA-binding protein (RBP) to influence post-transcriptional regulation is considered to be an important pathway for circRNA function, such as acting as an oncogenic RBP sponge to inhibit cancer. In this study, we design a deep learning framework, CRPBsites, to predict the binding sites of RBPs on circRNAs. In this model, the sequences of variable-length binding sites are transformed into embedding vectors by word2vec model. Bidirectional LSTM is used to encode the embedding vectors of binding sites, and then they are fed into another LSTM decoder for decoding and classification tasks. To train and test the model, we construct four datasets that contain sequences of variable-length binding sites on circRNAs, and each set corresponds to an RBP, which is overexpressed in bladder cancer tissues. Experimental results on four datasets and comparison with other existing models show that CRPBsites has superior performance. Afterwards, we found that there were highly similar binding motifs in the four binding site datasets. Finally, we applied well-trained CRPBsites to identify the binding sites of IGF2BP1 on circCDYL, and the results proved the effectiveness of this method. In conclusion, CRPBsites is an effective prediction model for circRNA-RBP interaction site identification. We hope that CRPBsites can provide valuable guidance for experimental studies on the influence of circRNA on post-transcriptional regulation.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 5
    Publikationsdatum: 2021-08-20
    Beschreibung: Intratumoral heterogeneity is a well-documented feature of human cancers and is associated with outcome and treatment resistance. However, a heterogeneous tumor transcriptome contributes an unknown level of variability to analyses of differentially expressed genes (DEGs) that may contribute to phenotypes of interest, including treatment response. Although current clinical practice and the vast majority of research studies use a single sample from each patient, decreasing costs of sequencing technologies and computing power have made repeated-measures analyses increasingly economical. Repeatedly sampling the same tumor increases the statistical power of DEG analysis, which is indispensable toward downstream analysis and also increases one’s understanding of within-tumor variance, which may affect conclusions. Here, we compared five different methods for analyzing gene expression profiles derived from repeated sampling of human prostate tumors in two separate cohorts of patients. We also benchmarked the sensitivity of generalized linear models to linear mixed models for identifying DEGs contributing to relevant prostate cancer pathways based on a ground-truth model.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 6
    Publikationsdatum: 2021-08-20
    Beschreibung: Efforts to elucidate protein–DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie’s outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie’s webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 7
    Publikationsdatum: 2021-08-20
    Beschreibung: Accurate prediction of immunogenic peptide recognized by T cell receptor (TCR) can greatly benefit vaccine development and cancer immunotherapy. However, identifying immunogenic peptides accurately is still a huge challenge. Most of the antigen peptides predicted in silico fail to elicit immune responses in vivo without considering TCR as a key factor. This inevitably causes costly and time-consuming experimental validation test for predicted antigens. Therefore, it is necessary to develop novel computational methods for precisely and effectively predicting immunogenic peptide recognized by TCR. Here, we described DLpTCR, a multimodal ensemble deep learning framework for predicting the likelihood of interaction between single/paired chain(s) of TCR and peptide presented by major histocompatibility complex molecules. To investigate the generality and robustness of the proposed model, COVID-19 data and IEDB data were constructed for independent evaluation. The DLpTCR model exhibited high predictive power with area under the curve up to 0.91 on COVID-19 data while predicting the interaction between peptide and single TCR chain. Additionally, the DLpTCR model achieved the overall accuracy of 81.03% on IEDB data while predicting the interaction between peptide and paired TCR chains. The results demonstrate that DLpTCR has the ability to learn general interaction rules and generalize to antigen peptide recognition by TCR. A user-friendly webserver is available at http://jianglab.org.cn/DLpTCR/. Additionally, a stand-alone software package that can be downloaded from https://github.com/jiangBiolab/DLpTCR.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 8
    Publikationsdatum: 2021-07-11
    Beschreibung: Motivation The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. Results Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure–function paradigm toward a protein structure-surface(s)-function paradigm. Availabilityand implementation All data are available online at http://datasetmachat.drugdesign.fr. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 9
    Publikationsdatum: 2021-08-18
    Beschreibung: Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 10
    Publikationsdatum: 2021-08-14
    Beschreibung: Good knowledge of a peptide’s tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5–40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 11
    Publikationsdatum: 2021-08-20
    Beschreibung: Deep generative models have been an upsurge in the deep learning community since they were proposed. These models are designed for generating new synthetic data including images, videos and texts by fitting the data approximate distributions. In the last few years, deep generative models have shown superior performance in drug discovery especially de novo molecular design. In this study, deep generative models are reviewed to witness the recent advances of de novo molecular design for drug discovery. In addition, we divide those models into two categories based on molecular representations in silico. Then these two classical types of models are reported in detail and discussed about both pros and cons. We also indicate the current challenges in deep generative models for de novo molecular design. De novo molecular design automatically is promising but a long road to be explored.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 12
    Publikationsdatum: 2021-08-19
    Beschreibung: DNA methylation may be regulated by genetic variants within a genomic region, referred to as methylation quantitative trait loci (mQTLs). The changes of methylation levels can further lead to alterations of gene expression, and influence the risk of various complex human diseases. Detecting mQTLs may provide insights into the underlying mechanism of how genotypic variations may influence the disease risk. In this article, we propose a methylation random field (MRF) method to detect mQTLs by testing the association between the methylation level of a CpG site and a set of genetic variants within a genomic region. The proposed MRF has two major advantages over existing approaches. First, it uses a beta distribution to characterize the bimodal and interval properties of the methylation trait at a CpG site. Second, it considers multiple common and rare genetic variants within a genomic region to identify mQTLs. Through simulations, we demonstrated that the MRF had improved power over other existing methods in detecting rare variants of relatively large effect, especially when the sample size is small. We further applied our method to a study of congenital heart defects with 83 cardiac tissue samples and identified two mQTL regions, MRPS10 and PSORS1C1, which were colocalized with expression QTL in cardiac tissue. In conclusion, the proposed MRF is a useful tool to identify novel mQTLs, especially for studies with limited sample sizes.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 13
    Publikationsdatum: 2021-06-29
    Beschreibung: Motivation The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. Results A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages: The first is a modal exploration stage, in which a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. The second is a modal maintaining stage, where an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on a large set of 320 non-redundant proteins, where MMpred obtains models with TM-score≥0.5 on 291 cases, which is 28% higher than that of Rosetta guided with the same set of distance constraints. In addition, on 320 benchmark proteins, the enhanced version of MMpred (E-MMpred) has 167 targets better than trRosetta when the best of five models are evaluated. The average TM-score of the best model of E-MMpred is 0.732, which is comparable to trRosetta (0.730). Availability and implementation The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 14
    Publikationsdatum: 2021-08-20
    Beschreibung: Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 15
    Publikationsdatum: 2021-08-20
    Beschreibung: Protein engineering and design principles employing the 20 standard amino acids have been extensively used to achieve stable protein scaffolds and deliver their specific activities. Although this confers some advantages, it often restricts the sequence, chemical space, and ultimately the functional diversity of proteins. Moreover, although site-specific incorporation of non-natural amino acids (nnAAs) has been proven to be a valuable strategy in protein engineering and therapeutics development, its utility in the affinity-maturation of nanobodies is not fully explored. Besides, current experimental methods do not routinely employ nnAAs due to their enormous library size and infinite combinations. To address this, we have developed an integrated computational pipeline employing structure-based protein design methodologies, molecular dynamics simulations and free energy calculations, for the binding affinity prediction of an nnAA-incorporated nanobody toward its target and selection of potent binders. We show that by incorporating halogenated tyrosines, the affinity of 9G8 nanobody can be improved toward epidermal growth factor receptor (EGFR), a crucial cancer target. Surface plasmon resonance (SPR) assays showed that the binding of several 3-chloro-l-tyrosine (3MY)-incorporated nanobodies were improved up to 6-fold into a picomolar range, and the computationally estimated binding affinities shared a Pearson’s r of 0.87 with SPR results. The improved affinity was found to be due to enhanced van der Waals interactions of key 3MY-proximate nanobody residues with EGFR, and an overall increase in the nanobody’s structural stability. In conclusion, we show that our method can facilitate screening large libraries and predict potent site-specific nnAA-incorporated nanobody binders against crucial disease-targets.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 16
    Publikationsdatum: 2021-08-20
    Beschreibung: Over the past few years, meta-analysis has become popular among biomedical researchers for detecting biomarkers across multiple cohort studies with increased predictive power. Combining datasets from different sources increases sample size, thus overcoming the issue related to limited sample size from each individual study and boosting the predictive power. This leads to an increased likelihood of more accurately predicting differentially expressed genes/proteins or significant biomarkers underlying the biological condition of interest. Currently, several meta-analysis methods and tools exist, each having its own strengths and limitations. In this paper, we survey existing meta-analysis methods, and assess the performance of different methods based on results from different datasets as well as assessment from prior knowledge of each method. This provides a reference summary of meta-analysis models and tools, which helps to guide end-users on the choice of appropriate models or tools for given types of datasets and enables developers to consider current advances when planning the development of new meta-analysis models and more practical integrative tools.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 17
    Publikationsdatum: 2021-08-12
    Beschreibung: Motivation Co-evolution analysis can be used to accurately predict residue–residue contacts from multiple sequence alignments. The introduction of machine-learning techniques has enabled substantial improvements in precision and a shift from predicting binary contacts to predict distances between pairs of residues. These developments have significantly improved the accuracy of de novo prediction of static protein structures. With AlphaFold2 lifting the accuracy of some predicted protein models close to experimental levels, structure prediction research will move on to other challenges. One of those areas is the prediction of more than one conformation of a protein. Here, we examine the potential of residue–residue distance predictions to be informative of protein flexibility rather than simply static structure. Results We used DMPfold to predict distance distributions for every residue pair in a set of proteins that showed both rigid and flexible behaviour. Residue pairs that were in contact in at least one reference structure were classified as rigid, flexible or neither. The predicted distance distribution of each residue pair was analysed for local maxima of probability indicating the most likely distance or distances between a pair of residues. We found that rigid residue pairs tended to have only a single local maximum in their predicted distance distributions while flexible residue pairs more often had multiple local maxima. These results suggest that the shape of predicted distance distributions contains information on the rigidity or flexibility of a protein and its constituent residues. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 18
    Publikationsdatum: 2021-08-16
    Beschreibung: Motivation The well-known fact that protein structures are more conserved than their sequences forms the basis of several areas of computational structural biology. Methods based on the structure analysis provide more complete information on residue conservation in evolutionary processes. This is crucial for the determination of evolutionary relationships between proteins and for the identification of recurrent structural patterns present in biomolecules involved in similar functions. However, algorithmic structural alignment is much more difficult than multiple sequence alignment. This study is devoted to the development and applications of DAMA—a novel effective environment capable to compute and analyze multiple structure alignments. Results DAMA is based on local structural similarities, using local 3D structure descriptors and thus accounts for nearest-neighbor molecular environments of aligned residues. It is constrained neither by protein topology nor by its global structure. DAMA is an extension of our previous study (DEDAL) which demonstrated the applicability of local descriptors to pairwise alignment problems. Since the multiple alignment problem is NP-complete, an effective heuristic approach has been developed without imposing any artificial constraints. The alignment algorithm searches for the largest, consistent ensemble of similar descriptors. The new method is capable to capture most of the biologically significant similarities present in canonical test sets and is discriminatory enough to prevent the emergence of larger, but meaningless, solutions. Tests performed on the test sets, including protein kinases, demonstrate DAMA’s capability of identifying equivalent residues, which should be very useful in discovering the biological nature of proteins similarity. Performance profiles show the advantage of DAMA over other methods, in particular when using a strict similarity measure QC, which is the ratio of correctly aligned columns, and when applying the methods to more difficult cases. Availability and implementation DAMA is available online at http://dworkowa.imdik.pan.pl/EP/DAMA. Linux binaries of the software are available upon request. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 19
    Publikationsdatum: 2021-03-31
    Beschreibung: Motivation Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. Results Here, we first show that in multiple animal and plant datasets, 18 to 62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily-informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND. Availability OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 20
    Publikationsdatum: 2021-03-31
    Beschreibung: Summary VCF files with results of sequencing projects take a lot of space. We propose the VCFShark, which is able to compress VCF files up to an order of magnitude better than the de facto standards (gzipped VCF and BCF). The advantage over competitors is the greatest when compressing VCF files containing large amounts of genotype data. The processing speeds up to 100 MB/s and main memory requirements lower than 30 GB allow to use our tool at typical workstations even for large datasets. Availability and Implementation https://github.com/refresh-bio/vcfshark Supplementary information Supplementary data are available at publisher’s Web site.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 21
  • 22
    Publikationsdatum: 2021-03-28
    Beschreibung: Motivation As the generation of complex single-cell RNA sequencing datasets becomes more commonplace it is the responsibility of researchers to provide access to these data in a way that can be easily explored and shared. Whilst it is often the case that data is deposited for future bioinformatic analysis many studies do not release their data in a way that is easy to explore by non-computational researchers. Results In order to help address this we have developed ShinyCell, an R package that converts single-cell RNA sequencing datasets into explorable and shareable interactive interfaces. These interfaces can be easily customised in order to maximise their usability and can be easily uploaded to online platforms to facilitate wider access to published data. Availability ShinyCell is available at https://github.com/SGDDNB/ShinyCell and https://figshare.com/projects/ShinyCell/100439.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 23
    Publikationsdatum: 2021-03-28
    Beschreibung: Motivation Genomic selection (GS) is currently deemed the most effective approach to speed up breeding of agricultural varieties. It has been recognized that consideration of multiple traits in GS can improve accuracy of prediction for traits of low heritability. However, since GS forgoes statistical testing with the idea of improving predictions, it does not facilitate mechanistic understanding of the contribution of particular single nucleotide polymorphisms (SNP). Results Here we propose a L2,1-norm regularized multivariate regression model and devise a fast and efficient iterative optimization algorithm, called L2,1-joint, applicable in multi-trait GS. The usage of the L2,1-norm facilitates variable selection in a penalized multivariate regression that considers the relation between individuals, when the number of SNPs is much larger than the number of individuals. The capacity for variable selection allows us to define master regulators that can be used in a multi-trait GS setting to dissect the genetic architecture of the analyzed traits. Our comparative analyses demonstrate that the proposed model is a favorable candidate compared to existing state-of-the-art approaches. Prediction and variable selection with data sets from Brassica napus, wheat and Arabidopsis thaliana diversity panels are conducted to further showcase the performance of the proposed model. Availability and implementation The model is implemented using R programming language and the code is freely available from https://github.com/alainmbebi/L21-norm-GS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 24
    Publikationsdatum: 2021-03-28
    Beschreibung: Summary Finding informative predictive features in high dimensional biological case-control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use, and customize. Our package enhances the original algorithm by providing new features and customizability for data preparation, model training and classification functionalities. We believe the new features will enable the adoption of the algorithm for a diverse range of datasets. Availability The software package for Python is available online at https://github.com/roohy/eps
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 25
    Publikationsdatum: 2021-03-24
    Beschreibung: Motivation There are high demands for joint genotyping of structural variations with short-read sequencing, but efficient and accurate genotyping in population scale is a challenging task. Results We developed muCNV that aggregates per-sample summary pileups for joint genotyping of 〉 100,000 samples. Pilot results show very low Mendelian inconsistencies. Applications to large-scale projects in cloud show the computational efficiencies of muCNV genotyping pipeline. Availability muCNV is publicly available for download at: https://github.com/gjun/muCNV Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 26
    Publikationsdatum: 2021-03-26
    Beschreibung: Motivation Molecular property prediction is a hot topic in recent years. Existing graph-based models ignore the hierarchical structures of molecules. According to the knowledge of chemistry and pharmacy, the functional groups of molecules are closely related to its physio-chemical properties and binding affinities. So, it should be helpful to represent molecular graphs by fragments that contain functional groups for molecular property prediction. Results In this paper, to boost the performance of molecule property prediction, we first propose a definition of molecule graph fragments that may be or contain functional groups, which are relevant to molecular properties, then develop a fragment-oriented multi-scale graph attention network for molecular property prediction, which is called FraGAT. Experiments on several widely-used benchmarks are conducted to evaluate FraGAT. Experimental results show that FraGAT achieves state-of-the-art predictive performance in most cases. Furthermore, our case studies showthat when the fragments used to represent the molecule graphs contain functional groups, the model can make better predictions. This conforms to our expectation and demonstrates the interpretability of the proposed model. Availability and implementation The code and data underlying this work are available in GitHub, at https://github.com/ZiqiaoZhang/FraGAT. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 27
    Publikationsdatum: 2021-03-26
    Beschreibung: Motivation The Anatomical Therapeutic Chemical (ATC) system is an official classification system established by the World Health Organization for medicines. Correctly assigning ATC classes to given compounds is an important research problem in drug discovery, which can not only discover the possible active ingredients of the compounds, but also infer theirs therapeutic, pharmacological, and chemical properties. Results In this paper, we develop an end-to-end multi-label classifier called CGATCPred to predict 14 main ATC classes for given compounds. In order to extract rich features of each compound, we use the deep Convolutional Neural Network (CNN) and shortcut connections to represent and learn the seven association scores between the given compound and others. Moreover, we construct the correlation graph of ATC classes and then apply graph convolutional network (GCN) on the graph for label embedding abstraction. We use all label embedding to guide the learning process of compound representation. As a result, by using the Jackknife test, CGATCPred obtain reliable Aiming of 81.94%, Coverage of 82.88%, Accuracy 80.81%, Absolute True 76.58% and Absolute False 2.75%, yielding significantly improvements compared to exiting multi-label classifiers. Availability The codes of CGATCPred are available at https://github.com/zhc940702/CGATCPred and https://zenodo.org/record/4552917. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 28
    Publikationsdatum: 2021-03-24
    Beschreibung: Motivation Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences—for example, the binding site motifs of DNA-and RNA-binding proteins. Results The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME’s capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely-used MEME Suite of sequence analysis tools. The name STREME stands for “Simple, Thorough, Rapid, Enriched Motif Elicitation”. Availability The STREME web server and source code are provided freely for non-commercial use at http://meme-suite.org.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 29
    Publikationsdatum: 2021-03-24
    Beschreibung: Motivation Understanding the mechanisms by which the zebrafish pectoral fin develops is expected to produce insights on how vertebrate limbs grow from a 2D cell layer to a 3D structure. Two mechanisms have been proposed to drive limb morphogenesis in tetrapods: a growth-based morphogenesis with a higher proliferation rate at the distal tip of the limb bud than at the proximal side, and directed cell behaviors that include elongation, division and migration in a nonrandom manner. Based on quantitative experimental biological data at the level of individual cells in the whole developing organ, we test the conditions for the dynamics of pectoral fin early morphogenesis. Results We found that during the development of the zebrafish pectoral fin, cells have a preferential elongation axis that gradually aligns along the proximodistal axis (PD) of the organ. Based on these quantitative observations, we build a center-based cell model enhanced with a polarity term and cell proliferation to simulate fin growth. Our simulations resulted in 3D fins similar in shape to the observed ones, suggesting that the existence of a preferential axis of cell polarization is essential to drive fin morphogenesis in zebrafish, as observed in the development of limbs in the mouse, but distal tip-based expansion is not. Availability Upon publication, biological data will be available at http://bioemergences.eu/modelingFin, and source code at https://github.com/guijoe/MaSoFin. Supplementary information Supplementary data are included in this manuscript.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 30
    Publikationsdatum: 2021-03-24
    Beschreibung: Motivation Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. Results In this paper, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge. Availability https://github.com/JaesikKim/HiG2Vec Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 31
    Publikationsdatum: 2021-03-23
    Beschreibung: Motivation Facing the increasing gap between high-throughput sequence data and limited functional insights, computational protein function annotation provides a high-throughput alternative to experimental approaches. However, current methods can have limited applicability while relying on protein data besides sequences, or lack generalizability to novel sequences, species and functions. Results To overcome aforementioned barriers in applicability and generalizability, we propose a novel deep learning model using only sequence information for proteins, named Transformer-based protein function Annotation through joint sequence–Label Embedding (TALE). For generalizability to novel sequences we use self attention-based transformers to capture global patterns in sequences. For generalizability to unseen or rarely seen functions (tail labels), we embed protein function labels (hierarchical GO terms on directed graphs) together with inputs/features (1D sequences) in a joint latent space. Combining TALE and a sequence similarity-based method, TALE+ outperformed competing methods when only sequence input is available. It even outperformed a state-of-the-art method using network information besides sequence, in two of the three gene ontologies. Furthermore, TALE and TALE+ showed superior generalizability to proteins of low similarity, new species, or rarely annotated functions compared to training data, revealing deep insights into the protein sequence–function relationship. Ablation studies elucidated contributions of algorithmic components toward the accuracy and the generalizability. Availability The data, source codes and models are available at https://github.com/Shen-Lab/TALE Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 32
    Publikationsdatum: 2021-03-23
    Beschreibung: Motivation Random sampling of metabolic fluxes can provide a comprehensive description of the capabilities of a metabolic network. However, current sampling approaches do not model thermodynamics explicitly, leading to inaccurate predictions of an organism’s potential or actual metabolic operations. Results We present a probabilistic framework combining thermodynamic quantities with steady-state flux constraints to analyze the properties of a metabolic network. It includes methods for probabilistic metabolic optimization and for joint sampling of thermodynamic and flux spaces. Applied to a model of E. coli, we use the methods to reveal known and novel mechanisms of substrate channeling, and to accurately predict reaction directions and metabolite concentrations. Interestingly, predicted flux distributions are multimodal, leading to discrete hypotheses on E. coli’s metabolic capabilities. Availability Python and MATLAB packages available at https://gitlab.com/csb.ethz/pta. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 33
    Publikationsdatum: 2021-03-26
    Beschreibung: Motivation Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less beneficial to directly integrate KGs with other smaller but higher quality data (e.g., experimental data). Most of existing approaches ignore KGs altogether. Some tries to directly integrate KGs with other data via graph neural networks with limited success. Furthermore most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is more meaningful but harder task. Results To fill the gaps, we propose a new method SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module that can efficiently anchor on relevant subgraphs from a KG, a self-attention based subgraph summarization scheme to generate reasoning path within the subgraph, and a multi-channel knowledge and data integration module that utilizes massive external biomedical knowledge for significantly improved multi-typed DDI predictions. SumGNN outperforms the best baseline by up to 5.54%, and performance gain is particularly significant in low data relation types. In addition, SumGNN provides interpretable prediction via the generated reasoning paths for each prediction. Availability The code is available in the supplementary. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 34
    Publikationsdatum: 2021-03-27
    Beschreibung: Motivation Most protein-structure superimposition tools consider only Cartesian coordinates. Yet, much of biology happens on the surface of proteins, which is why proteins with shared ancestry and similar function often have comparable surface shapes. Superposition of proteins based on surface shape can enable comparison of highly divergent proteins, identify convergent evolution and enable detailed comparison of surface features and binding sites. Results We present ZEAL, an interactive tool to superpose global and local protein structures based on their shape resemblance using 3D (Zernike-Canterakis) functions to represent the molecular surface. In a benchmark study of structures with the same fold, we show that ZEAL outperforms two other methods for shape-based superposition. In addition, alignments from ZEAL was of comparable quality to the coordinate-based superpositions provided by TM-align. For comparisons of proteins with limited sequence and backbone-fold similarity, where coordinate-based methods typically fail, ZEAL can often find alignments with substantial surface-shape correspondence. In combination with shape-based matching, ZEAL can be used as a general tool to study relationships between shape and protein function. We identify several categories of protein functions where global shape similarity is significantly more likely than expected by random chance, when comparing proteins with little similarity on the fold level. In particular, we find that global surface shape similarity is particular common among DNA binding proteins. Availability ZEAL can be used online at https://andrelab.org/zeal or as a standalone program with command line or graphical user interface. Source files and installers are available at https://github.com/Andre-lab/ZEAL Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 35
    Publikationsdatum: 2021-03-17
    Beschreibung: Motivation For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. Results In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. Availability The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 36
    Publikationsdatum: 2021-03-18
    Beschreibung: Emerging research shows that circular RNA (circRNA) plays a crucial role in the diagnosis, occurrence and prognosis of complex human diseases. Compared with traditional biological experiments, the computational method of fusing multi-source biological data to identify the association between circRNA and disease can effectively reduce cost and save time. Considering the limitations of existing computational models, we propose a semi-supervised generative adversarial network (GAN) model SGANRDA for predicting circRNA–disease association. This model first fused the natural language features of the circRNA sequence and the features of disease semantics, circRNA and disease Gaussian interaction profile kernel, and then used all circRNA–disease pairs to pre-train the GAN network, and fine-tune the network parameters through labeled samples. Finally, the extreme learning machine classifier is employed to obtain the prediction result. Compared with the previous supervision model, SGANRDA innovatively introduced circRNA sequences and utilized all the information of circRNA–disease pairs during the pre-training process. This step can increase the information content of the feature to some extent and reduce the impact of too few known associations on the model performance. SGANRDA obtained AUC scores of 0.9411 and 0.9223 in leave-one-out cross-validation and 5-fold cross-validation, respectively. Prediction results on the benchmark dataset show that SGANRDA outperforms other existing models. In addition, 25 of the top 30 circRNA–disease pairs with the highest scores of SGANRDA in case studies were verified by recent literature. These experimental results demonstrate that SGANRDA is a useful model to predict the circRNA–disease association and can provide reliable candidates for biological experiments.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 37
    Publikationsdatum: 2021-03-15
    Beschreibung: Motivation Ribosome Profiling (Ribo-seq) has revolutionized the study of RNA translation by providing information on ribosome positions across all translated RNAs with nucleotide-resolution. Yet several technical limitations restrict the sequencing depth of such experiments, the most common of which is the overabundance of rRNA fragments. Various strategies can be employed to tackle this issue, including the use of commercial rRNA depletion kits. However, as they are designed for more standardized RNAseq experiments, they may perform suboptimally in Ribo-seq. In order to overcome this, it is possible to use custom biotinylated oligos complementary to the most abundant rRNA fragments, however currently no computational framework exists to aid the design of optimal oligos. Results Here, we first show that a major confounding issue is that the rRNA fragments generated via Ribo-seq vary significantly with differing experimental conditions, suggesting that a “one-size-fits-all” approach may be inefficient. Therefore we developed Ribo-ODDR, an oligo design pipeline integrated with a user-friendly interface that assists in oligo selection for efficient experiment-specific rRNA depletion. Ribo-ODDR uses preliminary data to identify the most abundant rRNA fragments, and calculates the rRNA depletion efficiency of potential oligos. We experimentally show that Ribo-ODDR designed oligos outperform commercially available kits and lead to a significant increase in rRNA depletion in Ribo-seq. Availability Ribo-ODDR is freely accessible at https://github.com/fallerlab/Ribo-ODDR Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 38
    Publikationsdatum: 2021-03-15
    Beschreibung: Summary Many experimental approaches have been developed to identify transcription start sites (TSS) from genomic scale data. However, experiment specific biases lead to large numbers of false-positive calls. Here, we present our integrative approach iTiSS, which is an accurate and generic TSS caller for any TSS profiling experiment in eukaryotes, and substantially reduces the number of false positives by a joint analysis of several complementary datasets. Availability and implementation iTiSS is platform independent and implemented in Java (v1.8) and is freely available at https://www.erhard-lab.de/software and https://github.com/erhard-lab/iTiSS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 39
    Publikationsdatum: 2021-03-15
    Beschreibung: Summary Once folded, natural protein molecules have few energetic conflicts within their polypeptide chains. Many protein structures do however contain regions where energetic conflicts remain after folding, i.e. they are highly frustrated. These regions, kept in place over evolutionary and physiological timescales, are related to several functional aspects of natural proteins such as protein–protein interactions, small ligand recognition, catalytic sites and allostery. Here, we present FrustratometeR, an R package that easily computes local energetic frustration on a personal computer or a cluster. This package facilitates large scale analysis of local frustration, point mutants and molecular dynamics (MD) trajectories, allowing straightforward integration of local frustration analysis into pipelines for protein structural analysis. Availability and implementation https://github.com/proteinphysiologylab/frustratometeR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 40
    Publikationsdatum: 2021-02-19
    Beschreibung: Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 41
    Publikationsdatum: 2021-03-19
    Beschreibung: Motivation Breast cancer is one of the leading causes of cancer deaths among women worldwide. It is necessary to develop new breast cancer drugs because of the shortcomings of existing therapies. The traditional discovery process is time-consuming and expensive. Repositioning of clinically approved drugs has emerged as a novel approach for breast cancer therapy. However, serendipitous or experiential repurposing cannot be used as a routine method. Results In this study, we proposed a graph neural network model GraphRepur based on GraphSAGE for drug repurposing against breast cancer. GraphRepur integrated two major classes of computational methods, drug network-based and drug signature-based. The differentially expressed genes of disease, drug-exposure gene expression data, and the drug-drug links information were collected. By extracting the drug signatures and topological structure information contained in the drug relationships, GraphRepur can predict new drugs for breast cancer, outperforming previous state-of-the-art approaches and some classic machine learning methods. The high-ranked drugs have indeed been reported as new uses for breast cancer treatment recently. Availability The source code of our model and datasets are available at: https://github.com/cckamy/GraphRepur and https://figshare.com/articles/software/GraphRepur_Breast_Cancer_Drug_Repurposing/14220050 Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 42
    Publikationsdatum: 2021-03-18
    Beschreibung: Summary LocusZoom.js is a JavaScript library for creating interactive web-based visualizations of genetic association study results. It can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog). It can be embedded in web pages to enable data sharing and exploration. Views can be customized and extended to display other data types such as phenome-wide association study (PheWAS) results, chromatin co-accessibility, or eQTL measurements. A new web upload service harmonizes datasets, adds annotations, and makes it easy to explore user-provided result sets. Availability and implementation LocusZoom.js is open-source software under a permissive MIT license. Code and documentation are available at: https://github.com/statgen/locuszoom/. Installable packages for all versions are also distributed via NPM. Additional features are provided as standalone libraries to promote reuse. Use with your own GWAS results at https://my.locuszoom.org/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 43
    Publikationsdatum: 2021-03-19
    Beschreibung: Biometrics recognition takes advantage of feature extraction and pattern recognition to analyze the physical and behavioral characteristics of biological individuals to achieve the purpose of individual identification. As a typical biometric technology, palm print and palm vein have the characteristics of high recognition rate, stable features, easy location and good image quality, which have attracted the attention of researchers. This paper designs and develops a multispectral palm print and palm vein acquisition platform, which can quickly acquire palm spectrum and palm vein multispectral images with seven different wavelengths. We propose a multispectral palm print palmar vein recognition framework, and feature-level image fusion is performed after extracting features of palm print palmar vein images at different wavelengths. Through the multispectral palm print palm vein image fusion experiment, a more feasible multispectral palm print and palm vein image fusion scheme is proposed. Based on the results of image fusion, we further propose an improved convolutional neural network (CNN) for model training to achieve identity recognition based on multispectral palm print palm vein images. Finally, the effects of different CNN network structures and learning rates on the recognition results were analyzed and compared experimentally.
    Print ISSN: 0010-4620
    Digitale ISSN: 1460-2067
    Thema: Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 44
    Publikationsdatum: 2021-03-31
    Beschreibung: In this article we prove the following results: (i) Every hemimaximal set has minimal $c_{1}$-degree, i.e. if $B$ is hemimaximal and $A$ is a c.e. set such that $A le _{c_{1}} B$ then either $B leq _{{c}_{1}} A$ or $A$ is computable. (ii) The $sQ$-degree of a c.e. set contains either only one or infinitely many c.e. $c$-degrees. (iii) If $A,B$ are c.e. cylinders in the same $sQ_{1}$-degree and $A
    Print ISSN: 0955-792X
    Digitale ISSN: 1465-363X
    Thema: Informatik , Mathematik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 45
    Publikationsdatum: 2021-03-14
    Beschreibung: Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 46
    Publikationsdatum: 2021-03-14
    Beschreibung: Summary In this article, we introduce a hierarchical clustering and Gaussian mixture model with expectation-maximization (EM) algorithm for detecting copy number variants (CNVs) using whole exome sequencing (WES) data. The R shiny package ‘HCMMCNVs’ is also developed for processing user-provided bam files, running CNVs detection algorithm and conducting visualization. Through applying our approach to 325 cancer cell lines in 22 tumor types from Cancer Cell Line Encyclopedia (CCLE), we show that our algorithm is competitive with other existing methods and feasible in using multiple cancer cell lines for CNVs estimation. In addition, by applying our approach to WES data of 120 oral squamous cell carcinoma (OSCC) samples, our algorithm, using the tumor sample only, exhibits more power in detecting CNVs as compared with the methods using both tumors and matched normal counterparts. Availability and implementation HCMMCNVs R shiny software is freely available at github repository https://github.com/lunching/HCMM_CNVs.and Zenodo https://doi.org/10.5281/zenodo.4593371. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 47
    Publikationsdatum: 2021-03-14
    Beschreibung: Summary The need for an efficient and cost-effective method is compelling in biomolecular NMR. To tackle this problem, we have developed the Poky suite, the revolutionized platform with boundless possibilities for advancing research and technology development in signal detection, resonance assignment, structure calculation, and relaxation studies with the help of many automation and user interface tools. This software is extensible and scalable by scripting and batching as well as providing modern graphical user interfaces and a diverse range of modules right out of the box. Availability Poky is freely available to non-commercial users at https://poky.clas.ucdenver.edu. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 48
    Publikationsdatum: 2021-03-17
    Beschreibung: Motivation Breast cancer is a very heterogeneous disease and there is an urgent need to design computational methods that can accurately predict the prognosis of breast cancer for appropriate therapeutic regime. Recently, deep learning-based methods have achieved great success in prognosis prediction, but many of them directly combine features from different modalities that may ignore the complex inter-modality relations. In addition, existing deep learning-based methods do not take intra-modality relations into consideration that are also beneficial to prognosis prediction. Therefore, it is of great importance to develop a deep learning-based method that can take advantage of the complementary information between intra-modality and inter-modality by integrating data from different modalities for more accurate prognosis prediction of breast cancer. Results We present a novel unified framework named genomic and pathological deep bilinear network (GPDBN) for prognosis prediction of breast cancer by effectively integrating both genomic data and pathological images. In GPDBN, an inter-modality bilinear feature encoding module is proposed to model complex inter-modality relations for fully exploiting intrinsic relationship of the features across different modalities. Meanwhile, intra-modality relations that are also beneficial to prognosis prediction, are captured by two intra-modality bilinear feature encoding modules. Moreover, to take advantage of the complementary information between inter-modality and intra-modality relations, GPDBN further combines the inter- and intra-modality bilinear features by using a multi-layer deep neural network for final prognosis prediction. Comprehensive experiment results demonstrate that the proposed GPDBN significantly improves the performance of breast cancer prognosis prediction and compares favorably with existing methods. Availabilityand implementation GPDBN is freely available at https://github.com/isfj/GPDBN. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 49
    Publikationsdatum: 2021-03-17
    Beschreibung: Motivation Co-expression networks are a powerful gene expression analysis method to study how genes co-express together in clusters with functional coherence that usually resemble specific cell type behaviour for the genes involved. They can be applied to bulk-tissue gene expression profiling and assign function, and usually cell type specificity, to a high percentage of the gene pool used to construct the network. One of the limitations of this method is that each gene is predicted to play a role in a specific set of coherent functions in a single cell type (i.e. at most we get a single for each gene). We present here GMSCA (Gene Multifunctionality Secondary Co-expression Analysis), a software tool that exploits the co-expression paradigm to increase the number of functions and cell types ascribed to a gene in bulk-tissue co-expression networks. Results We applied GMSCA to 27 co-expression networks derived from bulk-tissue gene expression profiling of a variety of brain tissues. Neurons and glial cells (microglia, astrocytes and oligodendrocytes) were considered the main cell types. Applying this approach, we increase the overall number of predicted triplets by 46.73%. Moreover, GMSCA predicts that the SNCA gene, traditionally associated to work mainly in neurons, also plays a relevant function in oligodendrocytes. Availability The tool is available at GitHub, https://github.com/drlaguna/GMSCA as open-source software. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 50
    Publikationsdatum: 2021-03-18
    Beschreibung: Summary MMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute to taxonomic annotation, assigns them with robust labels and determines the contig’s taxonomic identity by weighted voting. Its fragment extraction step is suitable for the analysis of all domains of life. MMseqs2 taxonomy is 2–18× faster than state-of-the-art tools and also contains new modules for creating and manipulating taxonomic reference databases as well as reporting and visualizing taxonomic assignments. Availability and implementation MMseqs2 taxonomy is part of the MMseqs2 free open-source software package available for Linux, macOS and Windows at https://mmseqs.com. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 51
    Publikationsdatum: 2021-03-16
    Beschreibung: Motivation The post-transcriptional epigenetic modification on mRNA is an emerging field to study the gene regulatory mechanism and their association with diseases. Recently developed high-throughput sequencing technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables one to profile mRNA epigenetic modification transcriptome-wide. A few computational methods are available to identify transcriptome-wide mRNA modification, but they are either limited by over-simplified model ignoring the biological variance across replicates or suffer from low accuracy and efficiency. Results In this work, we develop a novel statistical method, based on an empirical Bayesian hierarchical model, to identify mRNA epigenetic modification regions from MeRIP-seq data. Our method accounts for various sources of variations in the data through rigorous modeling, and applies shrinkage estimation by borrowing informations from transcriptome-wide data to stabilize the parameter estimation. Simulation and real data analyses demonstrate that our method is more accurate, robust and efficient than the existing peak calling methods. Availability Our method TRES is implemented as an R package and is freely available on Github at https://github.com/ZhenxingGuo0015/TRES. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 52
    Publikationsdatum: 2021-03-29
    Beschreibung: We propose an internal calculus to check the satisfiability of a set of formulas in ${ oldsymbol {S4}}$. Our calculus directly supports model extraction and is designed so to implement a forward proof-search strategy that can be understood as a top-down construction of a model. We prove that the extracted models have minimal height.
    Print ISSN: 0955-792X
    Digitale ISSN: 1465-363X
    Thema: Informatik , Mathematik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 53
    Publikationsdatum: 2021-03-02
    Beschreibung: Motivation Identifying meaningful cancer driver genes in a cohort of tumors is a challenging task in cancer genomics. Although existing studies have identified known cancer drivers, most of them focus on detecting coding drivers with mutations. It is acknowledged that non-coding drivers can regulate driver mutations to promote cancer growth. In this work, we propose a novel node importance based network analysis (NIBNA) framework to detect coding and non-coding cancer drivers. We hypothesize that cancer drivers are crucial to the formation of community structures in cancer network, and removing them from the network greatly perturbs the network structure thereby critically affecting the functioning of the network. NIBNA detects cancer drivers using a three-step process; first, a condition-specific network is built by incorporating gene expression data and gene networks, second, the community structures in the network are estimated and third, a centrality-based metric is applied to compute node importance. Results We apply NIBNA to the BRCA dataset and it outperforms existing state-of-art methods in detecting coding cancer drivers. NIBNA also predicts 265 miRNA drivers and majority of these drivers have been validated in literature. Further we apply NIBNA to detect cancer subtype-specific drivers and several predicted drivers have been validated to be associated with cancer subtypes. Lastly, we evaluate NIBNA’s performance in detecting epithelial-mesenchymal transition (EMT) drivers, and we confirmed 8 coding and 13 miRNA drivers in the list of known genes. Availability The source code can be accessed at: https://github.com/mandarsc/NIBNA. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 54
    Publikationsdatum: 2021-03-08
    Beschreibung: Motivation A common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types. Results Megadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19 000 GTExV8 BigWig files in approximately 1 h using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package. Availability and implementation https://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/megadepth. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 55
    Publikationsdatum: 2021-03-10
    Beschreibung: Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 56
    Publikationsdatum: 2021-03-15
    Beschreibung: Motivation Adverse drug–drug interactions (DDIs) are crucial for drug research and mainly cause morbidity and mortality. Thus, the identification of potential DDIs is essential for doctors, patients and the society. Existing traditional machine learning models rely heavily on handcraft features and lack generalization. Recently, the deep learning approaches that can automatically learn drug features from the molecular graph or drug-related network have improved the ability of computational models to predict unknown DDIs. However, previous works utilized large labeled data and merely considered the structure or sequence information of drugs without considering the relations or topological information between drug and other biomedical objects (e.g. gene, disease and pathway), or considered knowledge graph (KG) without considering the information from the drug molecular structure. Results Accordingly, to effectively explore the joint effect of drug molecular structure and semantic information of drugs in knowledge graph for DDI prediction, we propose a multi-scale feature fusion deep learning model named MUFFIN. MUFFIN can jointly learn the drug representation based on both the drug-self structure information and the KG with rich bio-medical information. In MUFFIN, we designed a bi-level cross strategy that includes cross- and scalar-level components to fuse multi-modal features well. MUFFIN can alleviate the restriction of limited labeled data on deep learning models by crossing the features learned from large-scale KG and drug molecular graph. We evaluated our approach on three datasets and three different tasks including binary-class, multi-class and multi-label DDI prediction tasks. The results showed that MUFFIN outperformed other state-of-the-art baselines. Availability and implementation The source code and data are available at https://github.com/xzenglab/MUFFIN.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 57
    Publikationsdatum: 2021-03-15
    Beschreibung: Motivation A major challenge in analyzing cancer patient transcriptomes is that the tumors are inherently heterogeneous and evolving. We analyzed 214 bulk RNA samples of a longitudinal, prospective ovarian cancer cohort and found that the sample composition changes systematically due to chemotherapy and between the anatomical sites, preventing direct comparison of treatment-naive and treated samples. Results To overcome this, we developed PRISM, a latent statistical framework to simultaneously extract the sample composition and cell-type-specific whole-transcriptome profiles adapted to each individual sample. Our results indicate that the PRISM-derived composition-free transcriptomic profiles and signatures derived from them predict the patient response better than the composite raw bulk data. We validated our findings in independent ovarian cancer and melanoma cohorts, and verified that PRISM accurately estimates the composition and cell-type-specific expression through whole-genome sequencing and RNA in situ hybridization experiments. Availabilityand implementation https://bitbucket.org/anthakki/prism. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 58
    Publikationsdatum: 2021-03-08
    Beschreibung: Motivation Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. Results We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data. Availability and implementation RFCCA is implemented in a freely available R package on CRAN (https://CRAN.R-project.org/package=RFCCA). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 59
    Publikationsdatum: 2021-03-10
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 60
    Publikationsdatum: 2021-03-14
    Beschreibung: Motivation Polypharmacy side effects should be carefully considered for new drug development. However, considering all the complex drug–drug interactions that cause polypharma-cy side effects is challenging. Recently, graph neural network (GNN) models have handled these complex interactions successfully and shown great predictive perfor-mance. Nevertheless, the GNN models have difficulty providing intelligible factors of the prediction for biomedical and pharmaceutical domain experts. Method A novel approach, graph feature attention network (GFAN), is presented for inter-pretable prediction of polypharmacy side effects by emphasizing target genes differ-ently. To artificially simulate polypharmacy situations, where two different drugs are taken together, we formulated a node classification problem by using the concept of line graph in graph theory. Results Experiments with benchmark datasets validated interpretability of the GFAN and demonstrated competitive performance with the graph attention network in a previous work. And the specific cases in the polypharmacy side effect prediction experiments showed that the GFAN model is capable of very sensitively extracting the target genes for each side effect prediction. Availability and implementation https://github.com/SunjooBang/Polypharmacy-side-effect-prediction
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 61
    Publikationsdatum: 2021-02-09
    Beschreibung: Motivation The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data. Results We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank (Sudlow et al., 2015) dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. (2020). Availability https://github.com/rivas-lab/multisnpnet-Cox Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 62
    Publikationsdatum: 2021-02-12
    Beschreibung: Summary Searching for open reading frames is a routine task and a critical step prior to annotating protein coding regions in newly sequenced genomes or de novo transcriptome assemblies. With the tremendous increase in genomic and transcriptomic data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in genomic and transcriptomic sequences. The search is rapid and is fully customizable, with a choice of FASTA and BED output formats. Availability and implementation orfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 63
    Publikationsdatum: 2021-02-15
    Beschreibung: Summary Despite the continuous discovery of new transcript isoforms, fueled by the recent increase in accessibility and accuracy of long-read RNA sequencing data, functional differences between isoforms originating from the same gene often remain obscure. To address this issue and enable researchers to assess potential functional consequences of transcript isoform variation on the proteome, we developed IsoTV. IsoTV is a versatile pipeline to process, predict, and visualize the functional features of translated transcript isoforms. Attributes such as gene and isoform expression, transcript composition, and functional features are summarized in an easy-to-interpret visualization. IsoTV is able to analyze a variety of data types from all eukaryotic organisms, including short- and long-read RNA-seq data. Using Oxford Nanopore long read data, we demonstrate that IsoTV facilitates the understanding of potential protein isoform function in different cancer cell types. Availability IsoTV is available at https://github.molgen.mpg.de/MayerGroup/IsoTV, with the corresponding documentation at https://isotv.readthedocs.io/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 64
    Publikationsdatum: 2021-02-17
    Beschreibung: Motivation Viruses are ubiquitous in the living world, and their ability to infect more than one host defines their host range. However, information about which virus infects which host, and about which host is infected by which virus, is not readily available. Results We developed a web-based tool called the Viral Host Range database to record, analyze and disseminate experimental host range data for viruses infecting archaea, bacteria and eukaryotes. Availability The ViralHostRangeDB application is available from https://viralhostrangedb.pasteur.cloud. Its source code is freely available from the Gitlab hub of Institut Pasteur (https://gitlab.pasteur.fr/hub/viralhostrangedb).
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 65
    Publikationsdatum: 2021-02-17
    Beschreibung: SUMMARY Modern bioimaging and related areas such as sensor technology have undergone tremendous development over the last few years. As a result, contemporary imaging techniques, particularly electron microscopy (EM) and light sheet microscopy, can frequently generate datasets attaining sizes of several terabytes (TB). As a consequence, even seemingly simple data operations such as cropping, chromatic- and drift-corrections and even visualisation, poses challenges when applied to thousands of time points or tiles. To address this we developed BigDataProcessor2—a Fiji plugin facilitating processing workflows for TB sized image datasets. Availability and implementation BigDataProcessor2 is available as a Fiji plugin via the BigDataProcessor update site. The application is implemented in Java and the code is publicly available on GitHub (https://github.com/bigdataprocessor/bigdataprocessor2).
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 66
    Publikationsdatum: 2021-02-12
    Beschreibung: Motivation Hi-C is the most widely used assay for investigating genome-wide 3D organization of chromatin. When working with Hi-C data, it is often useful to calculate the similarity between contact matrices in order to assess experimental reproducibility or to quantify relationships among Hi-C data from related samples. The HiCRep algorithm has been widely adopted for this task, but the existing R implementation suffers from run time limitations on high-resolution Hi-C data or on large single-cell Hi-C datasets. Results We introduce a Python implementation of HiCRep and demonstrate that it is much faster and consumes much less memory than the existing R implementation. Furthermore, we give examples of HiCRep’s ability to accurately distinguish replicates from non-replicates and to reveal cell type structure among collections of Hi-C data. Availability and implementation HiCRep.py and its documentation are available with a GPL license at https://github.com/Noble-Lab/hicrep. The software may be installed automatically using the pip package installer. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 67
    Publikationsdatum: 2021-02-12
    Beschreibung: Motivation Inferring gene regulatory networks (GRNs) from high-throughput data is an important and challenging problem in systems biology. Although numerous GRN methods have been developed, most have focused on the verification of the specific data set. However, it is difficult to establish directed topological networks that are both suitable for time-series and non-time-series datasets due to the complexity and diversity of biological networks. Results Here, we proposed a novel method, GNIPLR (Gene networks inference based on projection and lagged regression) to infer GRNs from time-series or non-time-series gene expression data. GNIPLR projected gene data twice using the LASSO projection (LSP) algorithm and the linear projection (LP) approximation to produce a linear and monotonous pseudo-time series, and then determined the direction of regulation in combination with lagged regression analyses. The proposed algorithm was validated using simulated and real biological data. Moreover, we also applied the GNIPLR algorithm to the liver hepatocellular carcinoma (LIHC) and bladder urothelial carcinoma (BLCA) cancer expression datasets. These analyses revealed significantly higher accuracy and AUC values than other popular methods. Availability The GNIPLR tool is freely available at https://github.com/zyllluck/GNIPLR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 68
    Publikationsdatum: 2021-02-12
    Beschreibung: Summary Microorganisms infect and contaminate eukaryotic cells during the course of biological experiments. Because microbes influence host cell biology and may therefore lead to erroneous conclusions, a computational platform that facilitates decontamination is indispensable. Recent studies show that next-generation sequencing (NGS) data can be used to identify the presence of exogenous microbial species. Previously, we proposed an algorithm to improve detection of microbes in NGS data. Here, we developed an online application, OpenContami, which allows researchers easy access to the algorithm via interactive web-based interfaces. We have designed the application by incorporating a database comprising analytical results from a large-scale public dataset and data uploaded by users. The database serves as a reference for assessing user data and provides a list of genera detected from negative blank controls as a ‘blacklist’, which is useful for studying human infectious diseases. OpenContami offers a comprehensive overview of exogenous species in NGS datasets; as such, it will increase our understanding of the impact of microbial contamination on biological and pathological traits. Availability and implementation OpenContami is freely available at: https://openlooper.hgc.jp/opencontami/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 69
    Publikationsdatum: 2021-04-13
    Print ISSN: 0010-4620
    Digitale ISSN: 1460-2067
    Thema: Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 70
    Publikationsdatum: 2021-03-04
    Beschreibung: Motivation Tissue array (TA) staining, combined with whole slide imaging (WSI) methods facilitate discovery of biomarkers for diagnosis, prognostication and disease stratification. A key impediment in TA WSI analysis is handling missing tissue and artefacts when identifying tissue cores before quantitative, standardized downstream analysis. There is a need for an open access, user friendly, integrated analysis of the WSI generated using TAs in clinical and scientific research laboratories. Results We have developed QuArray (Quantitative Array Application) for image export and signal analysis of TAs using WSI. The application input is a WSI and a corresponding TA configuration file. QuArray identifies and exports core images and analyses chromogen staining in a simple graphical user interface. Output data is saved to file for further analysis including indexed data. Availabilityand implementation Available for download from https://github.com/c-arthurs/QuArray under an MIT licence.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 71
    Publikationsdatum: 2021-03-04
    Beschreibung: Motivation The growing production of massive heterogeneous biological data offers opportunities for new discoveries. However, performing multi-omics data analysis is challenging, and researchers are forced to handle the ever-increasing complexity of both data management and evolution of our biological understanding. Substantial efforts have been made to unify biological datasets into integrated systems. Unfortunately, they are not easily scalable, deployable and searchable, locally or globally. Results This publication presents two tools with a simple structure that can help any data provider, organization or researcher, requiring a reliable data search and analysis base. The first tool is Kibio, a scalable and adaptable data storage based on Elasticsearch search engine. The second tool is KibioR, a R package to pull, push and search Kibio datasets or any accessible Elasticsearch-based databases. These tools apply a uniform data exchange model and minimize the burden of data management by organizing data into a decentralized, versatile, searchable and shareable structure. Several case studies are presented using multiple databases, from drug characterization to miRNAs and pathways identification, emphasizing the ease of use and versatility of the Kibio/KibioR framework. Availability Both KibioR and Elasticsearch are open source. KibioR package source is available at https://github.com/regisoc/kibior and the library on CRAN at https://cran.r-project.org/package=kibior. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 72
    Publikationsdatum: 2021-03-05
    Beschreibung: Motivation When performing genome-wide association studies conventionally the additive genetic model is used to explore whether a single nucleotide polymorphism (SNP) is associated with a quantitative trait. But for variants, which do not follow an intermediate mode of inheritance (MOI), the recessive or the dominant genetic model can have more power to detect associations and furthermore the MOI is important for downstream analyses and clinical interpretation. When multiple MOIs are modelled the question arises, which describes the true underlying MOI best. Results We developed an R-package allowing for the first time to determine study specific critical values when one of the three models is more informative than the other ones for a quantitative trait locus. The package allows for user-friendly simulations to determine these critical values with predefined minor allele frequencies and study sizes. For application scenarios with extensive multiple testing we integrated an interpolation functionality to determine critical values already based on a moderate number of random draws. Availability and implementation The R-package pgainsim is freely available for download on Github at https://github.com/genepi-freiburg/pgainsim. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 73
    Publikationsdatum: 2021-03-08
    Beschreibung: Motivation Protein–protein interactions drive wide-ranging molecular processes, and characterizing at the atomic level how proteins interact (beyond just the fact that they interact) can provide key insights into understanding and controlling this machinery. Unfortunately, experimental determination of three-dimensional protein complex structures remains difficult and does not scale to the increasingly large sets of proteins whose interactions are of interest. Computational methods are thus required to meet the demands of large-scale, high-throughput prediction of how proteins interact, but unfortunately, both physical modeling and machine learning methods suffer from poor precision and/or recall. Results In order to improve performance in predicting protein interaction interfaces, we leverage the best properties of both data- and physics-driven methods to develop a unified Geometric Deep Neural Network, ‘PInet’ (Protein Interface Network). PInet consumes pairs of point clouds encoding the structures of two partner proteins, in order to predict their structural regions mediating interaction. To make such predictions, PInet learns and utilizes models capturing both geometrical and physicochemical molecular surface complementarity. In application to a set of benchmarks, PInet simultaneously predicts the interface regions on both interacting proteins, achieving performance equivalent to or even much better than the state-of-the-art predictor for each dataset. Furthermore, since PInet is based on joint segmentation of a representation of a protein surfaces, its predictions are meaningful in terms of the underlying physical complementarity driving molecular recognition. Availability and implementation PInet scripts and models are available at https://github.com/FTD007/PInet. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 74
    Publikationsdatum: 2021-03-03
    Beschreibung: Motivation Integrative multi-feature fusion analysis on biomedical data has gained much attention recently. In breast cancer, existing studies have demonstrated that combining genomic mRNA data and DNA methylation data can better stratify cancer patients with distinct prognosis than using single signature. However, those existing methods are simply combining these gene features in series and have ignored the correlations between separate omics dimensions over time. Results In the present study, we propose an adaptive multi-task learning method, which combines the Cox loss task with the ordinal loss task, for survival prediction of breast cancer patients using multi-modal learning instead of performing survival analysis on each feature dataset. First, we use local maximum quasi-clique merging (lmQCM) algorithm to reduce the mRNA and methylation feature dimensions and extract cluster eigengenes respectively. Then, we add an auxiliary ordinal loss to the original Cox model to improve the ability to optimize the learning process in training and regularization. The auxiliary loss helps to reduce the vanishing gradient problem for earlier layers and helps to decrease the loss of the primary task. Meanwhile, we use an adaptive weights approach to multi-task learning which weighs multiple loss functions by considering the homoscedastic uncertainty of each task. Finally, we build an ordinal cox hazards model for survival analysis and use long short-term memory (LSTM) method to predict patients’ survival risk. We use the cross-validation method and the concordance index (C-index) for assessing the prediction effect. Stringent cross-verification testing processes for the benchmark dataset and two additional datasets demonstrate that the developed approach is effective, achieving very competitive performance with existing approaches. Availability and implementation https://github.com/bhioswego/ML_ordCOX.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 75
    Publikationsdatum: 2021-03-03
    Beschreibung: Summary Epigenetic modifications reflect key aspects of transcriptional regulation, and many epigenomic datasets have been generated under different biological contexts to provide insights into regulatory processes. However, the technical noise in epigenomic datasets and the many dimensions (features) examined make it challenging to effectively extract biologically meaningful inferences from these datasets. We developed a package that reduces noise while normalizing the epigenomic data by a novel normalization method, followed by integrative dimensional reduction by learning and assigning epigenetic states. This package, called S3V2-IDEAS, can be used to identify epigenetic states for multiple features, or identify discretized signal intensity levels and a master peak list across different cell types for a single feature. We illustrate the outputs and performance of S3V2-IDEAS using 137 epigenomics datasets from the VISION project that provides ValIdated Systematic IntegratiON of epigenomic data in hematopoiesis. Availability and implementation S3V2-IDEAS pipeline is freely available as open source software released under an MIT license at: https://github.com/guanjue/S3V2_IDEAS_ESMP. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 76
    Publikationsdatum: 2021-03-16
    Beschreibung: Motivation Queueing theory can be effective in simulating biochemical reactions taking place in living cells, and the paper paves a step towards development of a comprehensive model of cell metabolism. Such a model could help to accelerate and reduce costs for developing and testing investigational drugs reducing number of laboratory animals needed to evaluate drugs. Results The paper presents a Krebs cycle model based on queueing theory. The model allows for tracking of metabolites concentration changes in real time. To validate the model, a drug-induced inhibition affecting activity of enzymes involved in Krebs cycle was simulated and compared with available experimental data. Availability The source code is freely available for download at https://github.com/UTP-WTIiE/KrebsCycleUsingQueueingTheory, implemented in C# supported in Linux or MS Windows. Supplementary information Supplementary data (tables with kinetic constants, kinetic equations and pseudocode) are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 77
    Publikationsdatum: 2021-03-15
    Beschreibung: Summary Identification of functional transcriptional regulators (TRs) associated with chromatin interactions is an important problem in studies of 3-dimensional genome organization and gene regulation. Direct inference of TR binding has been limited by the resolution of Hi-C data. Here, we present BART3D, a computational method for inferring TRs associated with genome-wide differential chromatin interactions by comparing Hi-C maps from two states, leveraging public ChIP-seq data for human and mouse. We demonstrate that BART3D can detect relevant TRs from dynamic Hi-C profiles with TR perturbation or cell differentiation. BART3D can be a useful tool in 3D genome data analysis and functional genomics research. Availability and implementation BART3D is implemented in Python and the source code is available at https://github.com/zanglab/bart3d. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 78
    Publikationsdatum: 2021-03-02
    Beschreibung: Motivation Infectious diseases caused by novel viruses have become a major public health concern. Rapid identification of virus–host interactions can reveal mechanistic insights into infectious diseases and shed light on potential treatments. Current computational prediction methods for novel viruses are based mainly on protein sequences. However, it is not clear to what extent other important features, such as the symptoms caused by the viruses, could contribute to a predictor. Disease phenotypes (i.e., signs and symptoms) are readily accessible from clinical diagnosis and we hypothesize that they may act as a potential proxy and an additional source of information for the underlying molecular interactions between the pathogens and hosts. Results We developed DeepViral, a deep learning based method that predicts protein–protein interactions (PPI) between humans and viruses. Motivated by the potential utility of infectious disease phenotypes, we first embedded human proteins and viruses in a shared space using their associated phenotypes and functions, supported by formalized background knowledge from biomedical ontologies. By jointly learning from protein sequences and phenotype features, DeepViral significantly improves over existing sequence-based methods for intra- and inter-species PPI prediction. Availability Code and datasets for reproduction and customization are available at https://github.com/bio-ontology-research-group/DeepViral. Prediction results for 14 virus families are available at https://doi.org/10.5281/zenodo.4429824.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 79
    Publikationsdatum: 2021-03-02
    Beschreibung: Motivation Transposable Elements (TEs) or jumping genes are DNA sequences that have an intrinsic capability to move within a host genome from one genomic location to another. Studies show that the presence of a TE within or adjacent to a functional gene may alter its expression. TEs can also cause an increase in the rate of mutation and can even mediate duplications and large insertions and deletions in the genome, promoting gross genetic rearrangements. The proper classification of identified jumping genes is important for analyzing their genetic and evolutionary effects. An effective classifier, which can explain the role of TEs in germline and somatic evolution more accurately, is needed. In this study, we examine the performance of a variety of machine learning (ML) techniques and propose a robust method, ClassifyTE, for the hierarchical classification of TEs with high accuracy, using a stacking-based ML method. Results We propose a stacking-based approach for the hierarchical classification of TEs. When trained on three different benchmark datasets, our proposed system achieved 4%, 10.68% and 10.13% average percentage improvement (using the hF measure) compared to several state-of-the-art methods. We developed an end-to-end automated hierarchical classification tool based on the proposed approach, ClassifyTE, to classify TEs up to the super-family level. We further evaluated our method on a new TE library generated by a homology-based classification method and found relatively high concordance at higher taxonomic levels. Thus, ClassifyTE paves the way for a more accurate analysis of the role of TEs. Availability and implementation The source code and data are available at https://github.com/manisa/ClassifyTE. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 80
    Publikationsdatum: 2021-03-02
    Beschreibung: Motivation Transcriptional surges generated by two-component systems (TCSs) have been observed experimentally in various bacteria. Suppression of the transcriptional surge may reduce the activity, virulence and drug resistance of bacteria. In order to investigate the general mechanisms, we use a PhoP/PhoQ TCS as a model system to derive a comprehensive mathematical modeling that governs the surge. PhoP is a response regulator, which serves as a transcription factor under a phosphorylation-dependent modulation by PhoQ, a histidine kinase. Results Our model reveals two major signaling pathways to modulate the phosphorylated PhoP (P-PhoP) level, one of which promotes the generation of P-PhoP, while the other depresses the level of P-PhoP. The competition between the P-PhoP-promoting and the P-PhoP-depressing pathways determines the generation of the P-PhoP surge. Furthermore, besides PhoQ, PhoP is also a bifunctional modulator that contributes to the dynamic control of P-PhoP state, leading to a biphasic regulation of the surge by the gene feedback loop. In summary, the mechanisms derived from the PhoP/PhoQ system for the transcriptional surges provide a better understanding on such a sophisticated signal transduction system and aid to develop new antimicrobial strategies targeting TCSs. Availability and implementation https://github.com/jianweishuai/TCS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 81
    Publikationsdatum: 2021-03-22
    Beschreibung: DPA Contest is a world-famous side-channel competition aiming at analyzing and evaluating the implementing security of some latest countermeasures. Improved Rotating S-box Masking Scheme (RSM2.0) is one of the most popular countermeasures designed during DPA Contest V4.2, which arms with both Low Entropy Masking Schemes and shuffling strategy to ensure the software security of AES-128, particularly the non-profiled security. Up to now, conducting high efficient non-profiled attacking scheme with low resource costs is still a challenge. In this paper, we first propose general and non-profiled leakage fingerprint attacks (named NP-LFA) for secret cracking and make use of it to crack RSM2.0 random masks with almost 100% accuracy. Further, we analyze the hidden vulnerabilities embedded in RSM2.0 implementation, and utilize them to bypass the shuffling defense and perform the master key recovery. Official evaluation results show that NP-LFA is capable of compromising RSM2.0 within 14 traces, each of which only costs 60 ms processing time. Such result validates the high efficiency and light-weighted characteristics of our attacking scheme, which has ranked the first in the official website till now. In addition, we discuss and put forward some possible strategies to mitigate our NP-LFA threats.
    Print ISSN: 0010-4620
    Digitale ISSN: 1460-2067
    Thema: Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 82
    Publikationsdatum: 2021-03-08
    Beschreibung: Motivation Batch effects heavily impact results in omics studies, causing bias and false positive results, but software to control them preemptively is lacking. Sample randomization prior to measurement is vital for minimizing these effects, but current approaches are often ad hoc, poorly documented and ill-equipped to handle multiple batches and outcomes. Results We developed Omixer—a Bioconductor package implementing multivariate and reproducible sample randomization for omics studies. It proactively counters correlations between technical factors and biological variables of interest by optimizing sample distribution across batches. Availabilityand implementation Omixer is available from Bioconductor at http://bioconductor.org/packages/release/bioc/html/Omixer.html. Scripts and data used to generate figures available upon request. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 83
    Publikationsdatum: 2021-03-24
    Beschreibung: Metabolomics, the comprehensive study of the metabolome, and lipidomics—the large-scale study of pathways and networks of cellular lipids—are major driving forces in enabling personalized medicine. Complicated and error-prone data analysis still remains a bottleneck, however, especially for identifying novel metabolites. Comparing experimental mass spectra to curated databases containing reference spectra has been the gold standard for identification of compounds, but constructing such databases is a costly and time-demanding task. Many software applications try to circumvent this process by utilizing cutting-edge advances in computational methods—including quantum chemistry and machine learning—and simulate mass spectra by performing theoretical, so called in silico fragmentations of compounds. Other solutions concentrate directly on experimental spectra and try to identify structural properties by investigating reoccurring patterns and the relationships between them. The considerable progress made in the field allows recent approaches to provide valuable clues to expedite annotation of experimental mass spectra. This review sheds light on individual strengths and weaknesses of these tools, and attempts to evaluate them—especially in view of lipidomics, when considering complex mixtures found in biological samples as well as mass spectrometer inter-instrument variability.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 84
    Publikationsdatum: 2021-03-08
    Beschreibung: Summary Here, we propose Fourier ring correlation-based quality estimation (FRC-QE) as a new metric for automated image quality estimation in 3D fluorescence microscopy acquisitions of cleared organoids that yields comparable measurements across experimental replicates, clearing protocols and works for different microscopy modalities. Availability and implementation FRC-QE is written in ImgLib2/Java and provided as an easy-to-use and macro-scriptable plugin for Fiji. Code, documentation, sample images and further information can be found under https://github.com/PreibischLab/FRC-QE. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 85
    Publikationsdatum: 2021-03-26
    Beschreibung: This paper proposes natural deduction systems for the representation of inferences in which several agents participate in deriving conclusions about what they believe or know, where belief and knowledge are understood in an intuitionistic sense. Multi-agent derivations in these systems may involve relatively complex belief (resp. knowledge) constructions which may include forms of nested, reciprocal, shared, distributed or universal belief/knowledge as well as attitudes de dicto/re/se. The systems consist of two main components: multi-agent belief bases which assign to each agent a subatomic system that represents the agent’s beliefs concerning atomic sentences and a set of multi-agent labelled rules for logically compound formulae. Derivations in these systems normalize. Moreover, normal derivations possess the subexpression property (a refinement of the subformula property) which makes them fully analytic. Relying on the normalization result, a proof-theoretic approach to the semantics of the intensional operators for intuitionistic belief/knowledge is presented which explains their meaning entirely by appeal to the structure of derivations. Importantly, this proof-theoretic semantics is autarkic with respect to its foundations as the systems (unlike, e.g. external/labelled proof systems which internalize possible worlds truth conditions) are not defined on the basis of a possible worlds semantics. Detailed applications to a logical puzzle (McCarthy’s three wise men puzzle) and to a semantical difficulty (Geach’s problem of intentional identity), respectively, illustrate the systems. The paper also provides comparisons with other approaches to intuitionistic belief/knowledge and multi-agent natural deduction.
    Print ISSN: 0955-792X
    Digitale ISSN: 1465-363X
    Thema: Informatik , Mathematik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 86
    Publikationsdatum: 2021-03-02
    Beschreibung: Summary We achieve a significant improvement in thermodynamic-based flux analysis (TFA) by introducing multivariate treatment of thermodynamic variables and leveraging component contribution, the state-of-the-art implementation of the group contribution methodology. Overall, the method greatly reduces the uncertainty of thermodynamic variables. Results We present multiTFA, a Python implementation of our framework. We evaluated our application using the core Escherichia coli model and achieved a median reduction of 6.8 kJ/mol in reaction Gibbs free energy ranges, while three out of 12 reactions in glycolysis changed from reversible to irreversible. Availability and implementation Our framework along with documentation is available on https://github.com/biosustain/multitfa. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 87
    Publikationsdatum: 2021-03-03
    Beschreibung: Motivation Chemical cross-linking coupled to mass spectrometry (XLMS) emerged as a powerful technique for studying protein structures and large-scale protein-protein interactions. Nonetheless, XLMS lacks software tailored toward dealing with multiple conformers; this scenario can lead to high-quality identifications that are mutually exclusive. This limitation hampers the applicability of XLMS in structural experiments of dynamic protein systems, where less abundant conformers of the target protein are expected in the sample. Results We present QUIN-XL, a software that uses unsupervised clustering to group cross-link identifications by their quantitative profile across multiple samples. QUIN-XL highlights regions of the protein or system presenting changes in its conformation when comparing different biological conditions. We demonstrate our software’s usefulness by revisiting the HSP90 protein, comparing three of its different conformers. QUIN-XL’s clusters correlate directly to known protein 3D structures of the conformers and therefore validates our software. Availabilityand implementation QUIN-XL and a user tutorial are freely available at http://patternlabforproteomics.org/quinxl for academic users. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 88
    Publikationsdatum: 2021-03-02
    Beschreibung: Motivation Microscopy technology plays important roles in many biological research fields. Solvent-cleared brain high-resolution (HR) 3D image reconstruction is an important microscopy application. However, 3D microscopy image generation is time-consuming and expensive. Therefore, we have developed a deep learning framework (DeepS) for both image optical sectioning and super resolution microscopy. Results Using DeepS to perform super resolution solvent-cleared mouse brain microscopy 3D image yields improved performance in comparison with the standard image processing workflow. We have also developed a web server to allow online usage of DeepS. Users can train their own models with only one pair of training images using the transfer learning function of the web server. Availabilityand implementation http://deeps.cibr.ac.cn. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 89
    Publikationsdatum: 2021-03-05
    Beschreibung: Proteins interact with each other to play critical roles in many biological processes in cells. Although promising, laboratory experiments usually suffer from the disadvantages of being time-consuming and labor-intensive. The results obtained are often not robust and considerably uncertain. Due recently to advances in high-throughput technologies, a large amount of proteomics data has been collected and this presents a significant opportunity and also a challenge to develop computational models to predict protein–protein interactions (PPIs) based on these data. In this paper, we present a comprehensive survey of the recent efforts that have been made towards the development of effective computational models for PPI prediction. The survey introduces the algorithms that can be used to learn computational models for predicting PPIs, and it classifies these models into different categories. To understand their relative merits, the paper discusses different validation schemes and metrics to evaluate the prediction performance. Biological databases that are commonly used in different experiments for performance comparison are also described and their use in a series of extensive experiments to compare different prediction models are discussed. Finally, we present some open issues in PPI prediction for future work. We explain how the performance of PPI prediction can be improved if these issues are effectively tackled.
    Print ISSN: 1467-5463
    Digitale ISSN: 1477-4054
    Thema: Biologie , Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 90
    Publikationsdatum: 2021-03-03
    Beschreibung: Motivation In pharmacogenomic studies, the biological context of cell lines influences the predictive ability of drug-response models and the discovery of biomarkers. Thus, similar cell lines are often studied together based on prior knowledge of biological annotations. However, this selection approach is not scalable with the number of annotations, and the relationship between gene-drug association patterns and biological context may not be obvious. Results We present a procedure to compare cell lines based on their gene-drug association patterns. Starting with a grouping of cell lines from biological annotation, we model gene-drug association patterns for each group as a bipartite graph between genes and drugs. This is accomplished by applying sparse canonical correlation analysis (SCCA) to extract the gene-drug associations, and using the canonical vectors to construct the edge weights. Then, we introduce a nuclear norm-based dissimilarity measure to compare the bipartite graphs. Accompanying our procedure is a permutation test to evaluate the significance of similarity of cell line groups in terms of gene-drug associations. In the pharmacogenomics datasets CTRP2, GDSC2, and CCLE, hierarchical clustering of carcinoma groups based on this dissimilarity measure uniquely reveals clustering patterns driven by carcinoma subtype rather than primary site. Next, we show that the top associated drugs or genes from SCCA can be used to characterize the clustering patterns of haematopoietic and lymphoid malignancies. Finally, we confirm by simulation that when drug responses are linearly-dependent on expression, our approach is the only one that can effectively infer the true hierarchy compared to existing approaches. Availability Bipartite graph-based hierarchical clustering is implemented in R and can be obtained from CRAN: https://CRAN.R-project.org/package=hierBipartite. The source code is available at https://github.com/CalvinTChi/hierBipartite Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 91
    Publikationsdatum: 2021-03-17
    Beschreibung:   MicroRNAs (miRNAs) are master regulators of gene expression in cancers. Their sequence variants or isoforms (isomiRs) are highly abundant and possess unique functions. Given their short sequence length and high heterogeneity, mapping isomiRs can be challenging; without adequate depth and data aggregation, low frequency events are often disregarded. To address these challenges, we present the Tumor IsomiR Encyclopedia (TIE): a dynamic database of isomiRs from over 10,000 adult and pediatric tumor samples in The Cancer Genome Atlas (TCGA) and The Therapeutically Applicable Research to Generate Effective Treatments (TARGET) projects. A key novelty of TIE is its ability to annotate heterogeneous isomiR sequences and aggregate the variants obtained across all datasets. Results can be browsed online or downloaded as spreadsheets. Here we show analysis of isomiRs of miR-21 and miR-30a to demonstrate the utility of TIE. Availability and implementation TIE search engine and data is freely available to use at https://isomir.ccr.cancer.gov/.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 92
    Publikationsdatum: 2021-03-17
    Print ISSN: 0955-792X
    Digitale ISSN: 1465-363X
    Thema: Informatik , Mathematik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 93
    Publikationsdatum: 2021-02-27
    Print ISSN: 0010-4620
    Digitale ISSN: 1460-2067
    Thema: Informatik
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 94
    Publikationsdatum: 2021-03-09
    Beschreibung: Motivation COVID-19 has several distinct clinical phases: a viral replication phase, an inflammatory phase, and in some patients, a hyper-inflammatory phase. High mortality is associated with patients developing cytokine storm syndrome. Treatment of hyper-inflammation in these patients using existing, approved therapies with proven safety profiles could address the immediate need to reduce mortality. Results We analyzed the changes in the gene expression, pathways and putative mechanisms induced by SARS-CoV2 in NHBE, and A549 cells, as well as COVID-19 lung vs. their respective controls. We used these changes to identify FDA approved drugs that could be repurposed to help COVID-19 patients with severe symptoms related to hyper-inflammation. We identified methylprednisolone (MP) as a potential leading therapy. The results were then confirmed in five independent validation data sets including Vero E6 cells, lung and intestinal organoids, as well as additional patient lung sample vs. their respective controls. Finally, the efficacy of MP was validated in an independent clinical study. Thirty-day all-cause mortality occurred at a significantly lower rate in the MP-treated group compared to control group (29.6% vs. 16.6%, p = 0.027). Clinical results confirmed the in silico prediction that MP could improve outcomes in severe cases of COVID-19. A low number needed to treat (NNT = 5) suggests MP may be more efficacious than dexamethasone or hydrocortisone. Availability iPathwayGuide is available at https://ipathwayguide.advaitabio.com/ Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 95
    Publikationsdatum: 2021-03-08
    Beschreibung: Summary Peptide microarrays have emerged as a powerful technology in immunoproteomics as they provide a tool to measure the abundance of different antibodies in patient serum samples. The high dimensionality and small sample size of many experiments challenge conventional statistical approaches, including those aiming to control the false discovery rate (FDR). Motivated by limitations in reproducibility and power of current methods, we advance an empirical Bayesian tool that computes local FDR statistics and local false sign rate statistics when provided with data on estimated effects and estimated standard errors from all the measured peptides. As the name suggests, the MixTwice tool involves the estimation of two mixing distributions, one on underlying effects and one on underlying variance parameters. Constrained optimization techniques provide for model fitting of mixing distributions under weak shape constraints (unimodality of the effect distribution). Numerical experiments show that MixTwice can accurately estimate generative parameters and powerfully identify non-null peptides. In a peptide array study of rheumatoid arthritis, MixTwice recovers meaningful peptide markers in one case where the signal is weak, and has strong reproducibility properties in one case where the signal is strong. Availabilityand implementation MixTwice is available as an R software package https://cran.r-project.org/web/packages/MixTwice/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 96
    Publikationsdatum: 2021-03-08
    Beschreibung: Motivation The processing of k-mers (subsequences of length k) is at the foundation of many sequence processing algorithms in bioinformatics, including k-mer counting for genome size estimation, genome assembly, and taxonomic classification for metagenomics. Minimizers—ordered m-mers where m 
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 97
    Publikationsdatum: 2021-02-17
    Beschreibung: Motivation Single-cell RNA-Seq (scRNA-Seq) data is useful in discovering cell heterogeneity and signature genes in specific cell populations in cancer and other complex diseases. Specifically, the investigation of condition-specific functional gene modules (FGM) can help to understand interactive gene networks and complex biological processes in different cell clusters. QUBIC2 is recognized as one of the most efficient and effective biclustering tools for condition-specific FGM identification from scRNA-Seq data. However, its limited availability to a C implementation restricted its application to only a few downstream analysis functionalities. We developed an R package named IRIS-FGM (Integrative scRNA-Seq Interpretation System for Functional Gene Module analysis) to support the investigation of FGMs and cell clustering using scRNA-Seq data. Empowered by QUBIC2, IRIS-FGM can effectively identify condition-specific FGMs, predict cell types/clusters, uncover differentially expressed genes, and perform pathway enrichment analysis. It is noteworthy that IRIS-FGM can also take Seurat objects as input, facilitating easy integration with the existing analysis pipeline. Availability and Implementation IRIS-FGM is implemented in the R environment (as of version 3.6) with the source code freely available at https://github.com/BMEngineeR/IRISFGM. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 98
    Publikationsdatum: 2021-02-25
    Beschreibung: Motivation Genome data is a subject of study for both biology and computer science since the start of the Human Genome Project in 1990. Since then, genome sequencing for medical and social purposes becomes more and more available and affordable. Genome data can be shared on public websites or with service providers. However, this sharing compromises the privacy of donors even under partial sharing conditions. We mainly focus on the liability aspect ensued by the unauthorized sharing of these genome data. One of the techniques to address the liability issues in data sharing is the watermarking mechanism. Results To detect malicious correspondents and service providers (SPs) -whose aim is to share genome data without individuals’ consent and undetected-, we propose a novel watermarking method on sequential genome data using belief propagation algorithm. In our method, we have two criteria to satisfy. (i) Embedding robust watermarks so that the malicious adversaries can not temper the watermark by modification and are identified with high probability (ii) Achieving ε-local differential privacy in all data sharings with SPs. For the preservation of system robustness against single SP and collusion attacks, we consider publicly available genomic information like Minor Allele Frequency, Linkage Disequilibrium, Phenotype Information and Familial Information. Our proposed scheme achieves 100% detection rate against the single SP attacks with only 3% watermark length. For the worst case scenario of collusion attacks (50% of SPs are malicious), 80% detection is achieved with 5% watermark length and 90% detection is achieved with 10% watermark length. For all cases, the impact of ε on precision remained negligible and high privacy is ensured. Availability https://github.com/acoksuz/PPRW_SGD_BPLDP Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 99
    Publikationsdatum: 2021-02-24
    Beschreibung:   The Probabilistic Identification of Causal SNPs (PICS) algorithm and web application was developed as a fine-mapping tool to determine the likelihood that each single nucleotide polymorphism (SNP) in LD with a reported index SNP is a true causal polymorphism. PICS is notable for its ability to identify candidate causal SNPs within a locus using only the index SNP, which are widely available from published GWAS, whereas other methods require full summary statistics or full genotype data. However, the original PICS web application operates on a single SNP at a time, with slow performance, severely limiting its usability. We have developed a next-generation PICS tool, PICS2, which enables performance of PICS analyses of large batches of index SNPs with much faster performance. Additional updates and extensions include use of LD reference data generated from 1000 Genomes phase 3; annotation of variant consequences; annotation of GTEx eQTL genes and downloadable PICS SNPs from GTEx eQTLs; the option of generating PICS probabilities from experimental summary statistics; and generation of PICS SNPs from all SNPs of the GWAS catalog, automatically updated weekly. These free and easy-to-use resources will enable efficient determination of candidate loci for biological studies to investigate the true causal variants underlying disease processes. Availability PICS2 is available at https://pics2.ucsf.edu. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
  • 100
    Publikationsdatum: 2021-02-23
    Beschreibung: Motivation Organization of the organoid models, imaged in 3D with a confocal microscope, is an essential morphometric index to assess responses to stress or therapeutic targets. In fact, differentiating malignant and normal cells is often difficult in monolayer cultures. But in 3D culture, colony organization can provide a clear set of indices for differentiating malignant and normal cells. The limiting factors are delineating each cell in a 3D colony in the presence of perceptual boundaries between adjacent cells and heterogeneity associated with cells being at different cell cycles. Results In a previous paper, we defined a potential field for delineating adjacent nuclei, with perceptual boundaries, in 2D histology images by coupling three deep networks. This concept is now extended to 3D and simplified by an enhanced cost function that replaces three deep networks with one. Validation includes four cell lines with diverse mutations, and a comparative analysis with the UNet models of microscopy indicates an improved performance with the F1-score of 0.83. Availability All software and annotated images are available through GitHub and Bioinformatics online. The software includes the proposed method, UNet for microscopy that was extended to 3D, and report generation for profiling colony organization. Supplementary information Supplementary data are available at Bioinformatics online and Github.
    Print ISSN: 1367-4803
    Digitale ISSN: 1460-2059
    Thema: Biologie , Informatik , Medizin
    Publiziert von Oxford University Press
    Standort Signatur Erwartet Verfügbarkeit
    BibTip Andere fanden auch interessant ...
Schließen ⊗
Diese Webseite nutzt Cookies und das Analyse-Tool Matomo. Weitere Informationen finden Sie hier...