ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Books
  • Articles  (3,332)
  • Oxford University Press  (2,108)
  • Public Library of Science  (1,224)
  • MDPI Publishing
  • 2020-2022  (3,332)
  • Computer Science  (3,246)
  • Nature of Science, Research, Systems of Higher Education, Museum Science  (86)
Collection
  • Books
  • Articles  (3,332)
Years
Year
Journal
  • 1
    Publication Date: 2021-08-20
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2021-08-20
    Description: Motivation Accurate automatic annotation of protein function relies on both innovative models and robust data sets. Due to their importance in biological processes, the identification of DNA-binding proteins directly from protein sequence has been the focus of many studies. However, the data sets used to train and evaluate these methods have suffered from substantial flaws. We describe some of the weaknesses of the data sets used in previous DNA-binding protein literature and provide several new data sets addressing these problems. We suggest new evaluative benchmark tasks that more realistically assess real-world performance for protein annotation models. We propose a simple new model for the prediction of DNA-binding proteins and compare its performance on the improved data sets to two previously published models. Additionally, we provide extensive tests showing how the best models predict across taxonomies. Results Our new gradient boosting model, which uses features derived from a published protein language model, outperforms the earlier models. Perhaps surprisingly, so does a baseline nearest neighbor model using BLAST percent identity. We evaluate the sensitivity of these models to perturbations of DNA-binding regions and control regions of protein sequences. The successful data-driven models learn to focus on DNA-binding regions. When predicting across taxonomies, the best models are highly accurate across species in the same kingdom and can provide some information when predicting across kingdoms. Code and Data Availability The data and results for this paper can be found at https://doi.org/10.5281/zenodo.5153906. The code for this paper can be found at https://doi.org/10.5281/zenodo.5153683. The code, data and results can also be found at https://github.com/AZaitzeff/tools_for_dna_binding_proteins.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2021-08-10
    Description: Human travel is one of the primary drivers of infectious disease spread. Models of travel are often used that assume the amount of travel to a specific destination decreases as cost of travel increases with higher travel volumes to more populated destinations. Trip duration, the length of time spent in a destination, can also impact travel patterns. We investigated the spatial patterns of travel conditioned on trip duration and find distinct differences between short and long duration trips. In short-trip duration travel networks, trips are skewed towards urban destinations, compared with long-trip duration networks where travel is more evenly spread among locations. Using gravity models to inform connectivity patterns in simulations of disease transmission, we show that pathogens with shorter generation times exhibit initial patterns of spatial propagation that are more predictable among urban locations. Further, pathogens with a longer generation time have more diffusive patterns of spatial spread reflecting more unpredictable disease dynamics.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2021-08-17
    Description: Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high-throughput screening studies, comprising 64 200 unique combinations of 4153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal molecular representation type, it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2021-08-06
    Description: Motivation The investigation of quantitative trait loci (QTL) is an essential component in our understanding of how organisms vary phenotypically. However, many important crop species are polyploid (carrying more than two copies of each chromosome), requiring specialized tools for such analyses. Moreover, deciphering meiotic processes at higher ploidy levels is not straightforward, but is necessary to understand the reproductive dynamics of these species, or uncover potential barriers to their genetic improvement. Results Here, we present polyqtlR, a novel software tool to facilitate such analyses in (auto)polyploid crops. It performs QTL interval mapping in F1 populations of outcrossing polyploids of any ploidy level using identity-by-descent probabilities. The allelic composition of discovered QTL can be explored, enabling favourable alleles to be identified and tracked in the population. Visualization tools within the package facilitate this process, and options to include genetic co-factors and experimental factors are included. Detailed information on polyploid meiosis including prediction of multivalent pairing structures, detection of preferential chromosomal pairing and location of double reduction events can be performed. Availabilityand implementation polyqtlR is freely available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/package=polyqtlR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2021-08-20
    Description: Circular RNAs (circRNAs) are widely expressed in highly diverged eukaryotes. Although circRNAs have been known for many years, their function remains unclear. Interaction with RNA-binding protein (RBP) to influence post-transcriptional regulation is considered to be an important pathway for circRNA function, such as acting as an oncogenic RBP sponge to inhibit cancer. In this study, we design a deep learning framework, CRPBsites, to predict the binding sites of RBPs on circRNAs. In this model, the sequences of variable-length binding sites are transformed into embedding vectors by word2vec model. Bidirectional LSTM is used to encode the embedding vectors of binding sites, and then they are fed into another LSTM decoder for decoding and classification tasks. To train and test the model, we construct four datasets that contain sequences of variable-length binding sites on circRNAs, and each set corresponds to an RBP, which is overexpressed in bladder cancer tissues. Experimental results on four datasets and comparison with other existing models show that CRPBsites has superior performance. Afterwards, we found that there were highly similar binding motifs in the four binding site datasets. Finally, we applied well-trained CRPBsites to identify the binding sites of IGF2BP1 on circCDYL, and the results proved the effectiveness of this method. In conclusion, CRPBsites is an effective prediction model for circRNA-RBP interaction site identification. We hope that CRPBsites can provide valuable guidance for experimental studies on the influence of circRNA on post-transcriptional regulation.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2021-08-20
    Description: Intratumoral heterogeneity is a well-documented feature of human cancers and is associated with outcome and treatment resistance. However, a heterogeneous tumor transcriptome contributes an unknown level of variability to analyses of differentially expressed genes (DEGs) that may contribute to phenotypes of interest, including treatment response. Although current clinical practice and the vast majority of research studies use a single sample from each patient, decreasing costs of sequencing technologies and computing power have made repeated-measures analyses increasingly economical. Repeatedly sampling the same tumor increases the statistical power of DEG analysis, which is indispensable toward downstream analysis and also increases one’s understanding of within-tumor variance, which may affect conclusions. Here, we compared five different methods for analyzing gene expression profiles derived from repeated sampling of human prostate tumors in two separate cohorts of patients. We also benchmarked the sensitivity of generalized linear models to linear mixed models for identifying DEGs contributing to relevant prostate cancer pathways based on a ground-truth model.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2021-08-20
    Description: Academic human capital (AHC) is a key element in the explanation of scientific productivity. However, few studies have analysed this topic in the academic context, and their conclusions about composition and measurement remain ambiguous. This study proposes a measurement scale to assess AHC, following a systemic procedure composed of two steps: qualitative and quantitative phases. First, the Delphi technique was applied to reach a consensus on the AHC factors, resulting in a scale of 22 items. Second, exploratory and confirmatory factor analyses were conducted to determine the underlying factorial structure of the scale, using a sample of 2,223 researchers in Spanish universities. The results provided a five-dimensional structure of AHC, measuring the knowledge and abilities required to perform research activities, as well as skills related to the organisation of scientific processes, alertness to research opportunities, and the openness to provide and receive criticism. This study poses interesting challenges for knowledge management in universities.
    Print ISSN: 0302-3427
    Electronic ISSN: 1471-5430
    Topics: Nature of Science, Research, Systems of Higher Education, Museum Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2021-08-20
    Description: Efforts to elucidate protein–DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie’s outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie’s webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2021-08-20
    Description: Accurate prediction of immunogenic peptide recognized by T cell receptor (TCR) can greatly benefit vaccine development and cancer immunotherapy. However, identifying immunogenic peptides accurately is still a huge challenge. Most of the antigen peptides predicted in silico fail to elicit immune responses in vivo without considering TCR as a key factor. This inevitably causes costly and time-consuming experimental validation test for predicted antigens. Therefore, it is necessary to develop novel computational methods for precisely and effectively predicting immunogenic peptide recognized by TCR. Here, we described DLpTCR, a multimodal ensemble deep learning framework for predicting the likelihood of interaction between single/paired chain(s) of TCR and peptide presented by major histocompatibility complex molecules. To investigate the generality and robustness of the proposed model, COVID-19 data and IEDB data were constructed for independent evaluation. The DLpTCR model exhibited high predictive power with area under the curve up to 0.91 on COVID-19 data while predicting the interaction between peptide and single TCR chain. Additionally, the DLpTCR model achieved the overall accuracy of 81.03% on IEDB data while predicting the interaction between peptide and paired TCR chains. The results demonstrate that DLpTCR has the ability to learn general interaction rules and generalize to antigen peptide recognition by TCR. A user-friendly webserver is available at http://jianglab.org.cn/DLpTCR/. Additionally, a stand-alone software package that can be downloaded from https://github.com/jiangBiolab/DLpTCR.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2021-08-21
    Description: Unlike other developed countries, the Fourth Industrial Revolution (4IR) discourse has become the central element within technology governance in Korea. This paper examines the reasons for the discourse’s success and its political and social implications. Based on the analysis of policy documents and the media coverage, I argue that political and economic elites have actively introduced the 4IR discourse to create novel momentum for promoting Information and Communications Technology (ICT) and to justify deregulatory measures while re-enacting the developmentalist imaginary. I also highlight that the 4IR discourse’s promoters have drawn upon the dialectics between the desirable future and the nation’s shared fear to urge the Korean society to accept the measures privileging the industry as the means of making the nation a developed country and avoiding being colonized again.
    Print ISSN: 0302-3427
    Electronic ISSN: 1471-5430
    Topics: Nature of Science, Research, Systems of Higher Education, Museum Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2021-07-11
    Description: Motivation The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. Results Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure–function paradigm toward a protein structure-surface(s)-function paradigm. Availabilityand implementation All data are available online at http://datasetmachat.drugdesign.fr. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2021-08-18
    Description: Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2021-08-14
    Description: Good knowledge of a peptide’s tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5–40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2021-08-20
    Description: Deep generative models have been an upsurge in the deep learning community since they were proposed. These models are designed for generating new synthetic data including images, videos and texts by fitting the data approximate distributions. In the last few years, deep generative models have shown superior performance in drug discovery especially de novo molecular design. In this study, deep generative models are reviewed to witness the recent advances of de novo molecular design for drug discovery. In addition, we divide those models into two categories based on molecular representations in silico. Then these two classical types of models are reported in detail and discussed about both pros and cons. We also indicate the current challenges in deep generative models for de novo molecular design. De novo molecular design automatically is promising but a long road to be explored.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2021-08-19
    Description: DNA methylation may be regulated by genetic variants within a genomic region, referred to as methylation quantitative trait loci (mQTLs). The changes of methylation levels can further lead to alterations of gene expression, and influence the risk of various complex human diseases. Detecting mQTLs may provide insights into the underlying mechanism of how genotypic variations may influence the disease risk. In this article, we propose a methylation random field (MRF) method to detect mQTLs by testing the association between the methylation level of a CpG site and a set of genetic variants within a genomic region. The proposed MRF has two major advantages over existing approaches. First, it uses a beta distribution to characterize the bimodal and interval properties of the methylation trait at a CpG site. Second, it considers multiple common and rare genetic variants within a genomic region to identify mQTLs. Through simulations, we demonstrated that the MRF had improved power over other existing methods in detecting rare variants of relatively large effect, especially when the sample size is small. We further applied our method to a study of congenital heart defects with 83 cardiac tissue samples and identified two mQTL regions, MRPS10 and PSORS1C1, which were colocalized with expression QTL in cardiac tissue. In conclusion, the proposed MRF is a useful tool to identify novel mQTLs, especially for studies with limited sample sizes.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2021-08-20
    Description: While public research organisations (PROs) are increasingly expected to transfer knowledge to businesses and other stakeholders, their engagement in knowledge transfer (KT) activities is still under-researched. Better understanding of PROs’ KT engagement, including how it is shaped by PROs’ organisational characteristics, could lead to better tailored policies in support to PROs’ effort to transfer knowledge. We develop a conceptual framework linking PROs’ specialisation in different fields of knowledge to their profiles of KT engagement and validate it empirically using a six-year panel data set of 33 PROs in the UK. We use multidimensional scaling and cluster analysis techniques to identify three distinct KT profiles, which are stable over time, and strongly associated with the PROs’ knowledge field specialisation. We argue that these profiles may depend on the different market readiness and user specificity of knowledge outputs arising from different fields of knowledge and derive implications for theory, policy, and practice.
    Print ISSN: 0302-3427
    Electronic ISSN: 1471-5430
    Topics: Nature of Science, Research, Systems of Higher Education, Museum Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2021-06-29
    Description: Motivation The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. Results A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages: The first is a modal exploration stage, in which a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. The second is a modal maintaining stage, where an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on a large set of 320 non-redundant proteins, where MMpred obtains models with TM-score≥0.5 on 291 cases, which is 28% higher than that of Rosetta guided with the same set of distance constraints. In addition, on 320 benchmark proteins, the enhanced version of MMpred (E-MMpred) has 167 targets better than trRosetta when the best of five models are evaluated. The average TM-score of the best model of E-MMpred is 0.732, which is comparable to trRosetta (0.730). Availability and implementation The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2021-08-20
    Description: Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2021-08-20
    Description: Protein engineering and design principles employing the 20 standard amino acids have been extensively used to achieve stable protein scaffolds and deliver their specific activities. Although this confers some advantages, it often restricts the sequence, chemical space, and ultimately the functional diversity of proteins. Moreover, although site-specific incorporation of non-natural amino acids (nnAAs) has been proven to be a valuable strategy in protein engineering and therapeutics development, its utility in the affinity-maturation of nanobodies is not fully explored. Besides, current experimental methods do not routinely employ nnAAs due to their enormous library size and infinite combinations. To address this, we have developed an integrated computational pipeline employing structure-based protein design methodologies, molecular dynamics simulations and free energy calculations, for the binding affinity prediction of an nnAA-incorporated nanobody toward its target and selection of potent binders. We show that by incorporating halogenated tyrosines, the affinity of 9G8 nanobody can be improved toward epidermal growth factor receptor (EGFR), a crucial cancer target. Surface plasmon resonance (SPR) assays showed that the binding of several 3-chloro-l-tyrosine (3MY)-incorporated nanobodies were improved up to 6-fold into a picomolar range, and the computationally estimated binding affinities shared a Pearson’s r of 0.87 with SPR results. The improved affinity was found to be due to enhanced van der Waals interactions of key 3MY-proximate nanobody residues with EGFR, and an overall increase in the nanobody’s structural stability. In conclusion, we show that our method can facilitate screening large libraries and predict potent site-specific nnAA-incorporated nanobody binders against crucial disease-targets.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2021-08-20
    Description: Over the past few years, meta-analysis has become popular among biomedical researchers for detecting biomarkers across multiple cohort studies with increased predictive power. Combining datasets from different sources increases sample size, thus overcoming the issue related to limited sample size from each individual study and boosting the predictive power. This leads to an increased likelihood of more accurately predicting differentially expressed genes/proteins or significant biomarkers underlying the biological condition of interest. Currently, several meta-analysis methods and tools exist, each having its own strengths and limitations. In this paper, we survey existing meta-analysis methods, and assess the performance of different methods based on results from different datasets as well as assessment from prior knowledge of each method. This provides a reference summary of meta-analysis models and tools, which helps to guide end-users on the choice of appropriate models or tools for given types of datasets and enables developers to consider current advances when planning the development of new meta-analysis models and more practical integrative tools.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2021-08-12
    Description: Motivation Co-evolution analysis can be used to accurately predict residue–residue contacts from multiple sequence alignments. The introduction of machine-learning techniques has enabled substantial improvements in precision and a shift from predicting binary contacts to predict distances between pairs of residues. These developments have significantly improved the accuracy of de novo prediction of static protein structures. With AlphaFold2 lifting the accuracy of some predicted protein models close to experimental levels, structure prediction research will move on to other challenges. One of those areas is the prediction of more than one conformation of a protein. Here, we examine the potential of residue–residue distance predictions to be informative of protein flexibility rather than simply static structure. Results We used DMPfold to predict distance distributions for every residue pair in a set of proteins that showed both rigid and flexible behaviour. Residue pairs that were in contact in at least one reference structure were classified as rigid, flexible or neither. The predicted distance distribution of each residue pair was analysed for local maxima of probability indicating the most likely distance or distances between a pair of residues. We found that rigid residue pairs tended to have only a single local maximum in their predicted distance distributions while flexible residue pairs more often had multiple local maxima. These results suggest that the shape of predicted distance distributions contains information on the rigidity or flexibility of a protein and its constituent residues. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2021-08-16
    Description: Motivation The well-known fact that protein structures are more conserved than their sequences forms the basis of several areas of computational structural biology. Methods based on the structure analysis provide more complete information on residue conservation in evolutionary processes. This is crucial for the determination of evolutionary relationships between proteins and for the identification of recurrent structural patterns present in biomolecules involved in similar functions. However, algorithmic structural alignment is much more difficult than multiple sequence alignment. This study is devoted to the development and applications of DAMA—a novel effective environment capable to compute and analyze multiple structure alignments. Results DAMA is based on local structural similarities, using local 3D structure descriptors and thus accounts for nearest-neighbor molecular environments of aligned residues. It is constrained neither by protein topology nor by its global structure. DAMA is an extension of our previous study (DEDAL) which demonstrated the applicability of local descriptors to pairwise alignment problems. Since the multiple alignment problem is NP-complete, an effective heuristic approach has been developed without imposing any artificial constraints. The alignment algorithm searches for the largest, consistent ensemble of similar descriptors. The new method is capable to capture most of the biologically significant similarities present in canonical test sets and is discriminatory enough to prevent the emergence of larger, but meaningless, solutions. Tests performed on the test sets, including protein kinases, demonstrate DAMA’s capability of identifying equivalent residues, which should be very useful in discovering the biological nature of proteins similarity. Performance profiles show the advantage of DAMA over other methods, in particular when using a strict similarity measure QC, which is the ratio of correctly aligned columns, and when applying the methods to more difficult cases. Availability and implementation DAMA is available online at http://dworkowa.imdik.pan.pl/EP/DAMA. Linux binaries of the software are available upon request. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
    Publication Date: 2021-02-25
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 25
    Publication Date: 2021-02-25
    Description: With more microbiome studies being conducted by African-based research groups, there is an increasing demand for knowledge and skills in the design and analysis of microbiome studies and data. However, high-quality bioinformatics courses are often impeded by differences in computational environments, complicated software stacks, numerous dependencies, and versions of bioinformatics tools along with a lack of local computational infrastructure and expertise. To address this, H3ABioNet developed a 16S rRNA Microbiome Intermediate Bioinformatics Training course, extending its remote classroom model. The course was developed alongside experienced microbiome researchers, bioinformaticians, and systems administrators, who identified key topics to address. Development of containerised workflows has previously been undertaken by H3ABioNet, and Singularity containers were used here to enable the deployment of a standard replicable software stack across different hosting sites. The pilot ran successfully in 2019 across 23 sites registered in 11 African countries, with more than 200 participants formally enrolled and 106 volunteer staff for onsite support. The pulling, running, and testing of the containers, software, and analyses on various clusters were performed prior to the start of the course by hosting classrooms. The containers allowed the replication of analyses and results across all participating classrooms running a cluster and remained available posttraining ensuring analyses could be repeated on real data. Participants thus received the opportunity to analyse their own data, while local staff were trained and supported by experienced experts, increasing local capacity for ongoing research support. This provides a model for delivering topic-specific bioinformatics courses across Africa and other remote/low-resourced regions which overcomes barriers such as inadequate infrastructures, geographical distance, and access to expertise and educational materials.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2021-02-25
    Description: Increased availability of drug response and genomics data for many tumor cell lines has accelerated the development of pan-cancer prediction models of drug response. However, it is unclear how much between-tissue differences in drug response and molecular characteristics may contribute to pan-cancer predictions. Also unknown is whether the performance of pan-cancer models could vary by cancer type. Here, we built a series of pan-cancer models using two datasets containing 346 and 504 cell lines, each with MEK inhibitor (MEKi) response and mRNA expression, point mutation, and copy number variation data, and found that, while the tissue-level drug responses are accurately predicted (between-tissue ρ = 0.88–0.98), only 5 of 10 cancer types showed successful within-tissue prediction performance (within-tissue ρ = 0.11–0.64). Between-tissue differences make substantial contributions to the performance of pan-cancer MEKi response predictions, as exclusion of between-tissue signals leads to a decrease in Spearman’s ρ from a range of 0.43–0.62 to 0.30–0.51. In practice, joint analysis of multiple cancer types usually has a larger sample size, hence greater power, than for one cancer type; and we observe that higher accuracy of pan-cancer prediction of MEKi response is almost entirely due to the sample size advantage. Success of pan-cancer prediction reveals how drug response in different cancers may invoke shared regulatory mechanisms despite tissue-specific routes of oncogenesis, yet predictions in different cancer types require flexible incorporation of between-cancer and within-cancer signals. As most datasets in genome sciences contain multiple levels of heterogeneity, careful parsing of group characteristics and within-group, individual variation is essential when making robust inference.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2021-02-25
    Description: Workflow management systems represent, manage, and execute multistep computational analyses and offer many benefits to bioinformaticians. They provide a common language for describing analysis workflows, contributing to reproducibility and to building libraries of reusable components. They can support both incremental build and re-entrancy—the ability to selectively re-execute parts of a workflow in the presence of additional inputs or changes in configuration and to resume execution from where a workflow previously stopped. Many workflow management systems enhance portability by supporting the use of containers, high-performance computing (HPC) systems, and clouds. Most importantly, workflow management systems allow bioinformaticians to delegate how their workflows are run to the workflow management system and its developers. This frees the bioinformaticians to focus on what these workflows should do, on their data analyses, and on their science. RiboViz is a package to extract biological insight from ribosome profiling data to help advance understanding of protein synthesis. At the heart of RiboViz is an analysis workflow, implemented in a Python script. To conform to best practices for scientific computing which recommend the use of build tools to automate workflows and to reuse code instead of rewriting it, the authors reimplemented this workflow within a workflow management system. To select a workflow management system, a rapid survey of available systems was undertaken, and candidates were shortlisted: Snakemake, cwltool, Toil, and Nextflow. Each candidate was evaluated by quickly prototyping a subset of the RiboViz workflow, and Nextflow was chosen. The selection process took 10 person-days, a small cost for the assurance that Nextflow satisfied the authors’ requirements. The use of prototyping can offer a low-cost way of making a more informed selection of software to use within projects, rather than relying solely upon reviews and recommendations by others.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2021-02-25
    Description: We live in an increasingly data-driven world, where high-throughput sequencing and mass spectrometry platforms are transforming biology into an information science. This has shifted major challenges in biological research from data generation and processing to interpretation and knowledge translation. However, postsecondary training in bioinformatics, or more generally data science for life scientists, lags behind current demand. In particular, development of accessible, undergraduate data science curricula has the potential to improve research and learning outcomes as well as better prepare students in the life sciences to thrive in public and private sector careers. Here, we describe the Experiential Data science for Undergraduate Cross-Disciplinary Education (EDUCE) initiative, which aims to progressively build data science competency across several years of integrated practice. Through EDUCE, students complete data science modules integrated into required and elective courses augmented with coordinated cocurricular activities. The EDUCE initiative draws on a community of practice consisting of teaching assistants (TAs), postdocs, instructors, and research faculty from multiple disciplines to overcome several reported barriers to data science for life scientists, including instructor capacity, student prior knowledge, and relevance to discipline-specific problems. Preliminary survey results indicate that even a single module improves student self-reported interest and/or experience in bioinformatics and computer science. Thus, EDUCE provides a flexible and extensible active learning framework for integration of data science curriculum into undergraduate courses and programs across the life sciences.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2021-03-31
    Description: Motivation Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. Results Here, we first show that in multiple animal and plant datasets, 18 to 62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily-informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND. Availability OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2021-03-29
    Description: The Sec complex catalyzes the translocation of proteins of the secretory pathway into the endoplasmic reticulum and the integration of membrane proteins into the endoplasmic reticulum membrane. Some substrate peptides require the presence and involvement of accessory proteins such as Sec63. Recently, a structure of the Sec complex from Saccharomyces cerevisiae, consisting of the Sec61 channel and the Sec62, Sec63, Sec71 and Sec72 proteins was determined by cryo-electron microscopy (cryo-EM). Here, we show by co-precipitation that the accessory membrane protein Sec62 is not required for formation of stable Sec63-Sec61 contacts. Molecular dynamics simulations started from the cryo-EM conformation of Sec61 bound to Sec63 and of unbound Sec61 revealed how Sec63 affects the conformation of Sec61 lateral gate, plug, pore region and pore ring diameter via three intermolecular contact regions. Molecular docking of SRP-dependent vs. SRP-independent peptide chains into the Sec61 channel showed that the pore regions affected by presence/absence of Sec63 play a crucial role in positioning the signal anchors of SRP-dependent substrates nearby the lateral gate.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2021-03-29
    Description: Candida albicans, an opportunistic fungal pathogen, is a significant cause of human infections, particularly in immunocompromised individuals. Phenotypic plasticity between two morphological phenotypes, yeast and hyphae, is a key mechanism by which C. albicans can thrive in many microenvironments and cause disease in the host. Understanding the decision points and key driver genes controlling this important transition and how these genes respond to different environmental signals is critical to understanding how C. albicans causes infections in the host. Here we build and analyze a Boolean dynamical model of the C. albicans yeast to hyphal transition, integrating multiple environmental factors and regulatory mechanisms. We validate the model by a systematic comparison to prior experiments, which led to agreement in 17 out of 22 cases. The discrepancies motivate alternative hypotheses that are testable by follow-up experiments. Analysis of this model revealed two time-constrained windows of opportunity that must be met for the complete transition from the yeast to hyphal phenotype, as well as control strategies that can robustly prevent this transition. We experimentally validate two of these control predictions in C. albicans strains lacking the transcription factor UME6 and the histone deacetylase HDA1, respectively. This model will serve as a strong base from which to develop a systems biology understanding of C. albicans morphogenesis.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2021-03-29
    Description: Atherosclerotic plaque rupture is responsible for a majority of acute vascular syndromes and this study aims to develop a prediction tool for plaque progression and rupture. Based on the follow-up coronary intravascular ultrasound imaging data, we performed patient-specific multi-physical modeling study on four patients to obtain the evolutional processes of the microenvironment during plaque progression. Four main pathophysiological processes, i.e., lipid deposition, inflammatory response, migration and proliferation of smooth muscle cells (SMCs), and neovascularization were coupled based on the interactions demonstrated by experimental and clinical observations. A scoring table integrating the dynamic microenvironmental indicators with the classical risk index was proposed to differentiate their progression to stable and unstable plaques. The heterogeneity of plaque microenvironment for each patient was demonstrated by the growth curves of the main microenvironmental factors. The possible plaque developments were predicted by incorporating the systematic index with microenvironmental indicators. Five microenvironmental factors (LDL, ox-LDL, MCP-1, SMC, and foam cell) showed significant differences between stable and unstable group (p 〈 0.01). The inflammatory microenvironments (monocyte and macrophage) had negative correlations with the necrotic core (NC) expansion in the stable group, while very strong positive correlations in unstable group. The inflammatory microenvironment is strongly correlated to the NC expansion in unstable plaques, suggesting that the inflammatory factors may play an important role in the formation of a vulnerable plaque. This prediction tool will improve our understanding of the mechanism of plaque progression and provide a new strategy for early detection and prediction of high-risk plaques.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2021-03-29
    Description: Vocalization in mammals, birds, reptiles, and amphibians occurs with airways that have wide openings to free-space for efficient sound radiations, but sound is also produced with occluded or semi-occluded airways that have small openings to free-space. It is hypothesized that pressures produced inside the airway with semi-occluded vocalizations have an overall widening effect on the airway. This overall widening then provides more opportunity to produce wide-narrow contrasts along the airway for variation in sound quality and loudness. For human vocalization described here, special emphasis is placed on the epilaryngeal airway, which can be adjusted for optimal aerodynamic power transfer and for optimal acoustic source-airway interaction. The methodology is three-fold, (1) geometric measurement of airway dimensions from CT scans, (2) aerodynamic and acoustic impedance calculation of the airways, and (3) simulation of acoustic signals with a self-oscillating computational model of the sound source and wave propagation.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2021-03-29
    Description: The SARS-CoV-2 pathogen is currently spreading worldwide and its propensity for presymptomatic and asymptomatic transmission makes it difficult to control. The control measures adopted in several countries aim at isolating individuals once diagnosed, limiting their social interactions and consequently their transmission probability. These interventions, which have a strong impact on the disease dynamics, can affect the inference of the epidemiological quantities. We first present a theoretical explanation of the effect caused by non-pharmaceutical intervention measures on the mean serial and generation intervals. Then, in a simulation study, we vary the assumed efficacy of control measures and quantify the effect on the mean and variance of realized generation and serial intervals. The simulation results show that the realized serial and generation intervals both depend on control measures and their values contract according to the efficacy of the intervention strategies. Interestingly, the mean serial interval differs from the mean generation time. The deviation between these two values depends on two factors. First, the number of undiagnosed infectious individuals. Second, the relationship between infectiousness, symptom onset and timing of isolation. Similarly, the standard deviations of realized serial and generation intervals do not coincide, with the former shorter than the latter on average. The findings of this study are directly relevant to estimates performed for the current COVID-19 pandemic. In particular, the effective reproduction number is often inferred using both daily incidence data and the generation interval. Failing to account for either contraction or mis-specification by using the serial interval could lead to biased estimates of the effective reproduction number. Consequently, this might affect the choices made by decision makers when deciding which control measures to apply based on the value of the quantity thereof.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2021-03-29
    Description: Predictions of COVID-19 case growth and mortality are critical to the decisions of political leaders, businesses, and individuals grappling with the pandemic. This predictive task is challenging due to the novelty of the virus, limited data, and dynamic political and societal responses. We embed a Bayesian time series model and a random forest algorithm within an epidemiological compartmental model for empirically grounded COVID-19 predictions. The Bayesian case model fits a location-specific curve to the velocity (first derivative) of the log transformed cumulative case count, borrowing strength across geographic locations and incorporating prior information to obtain a posterior distribution for case trajectories. The compartmental model uses this distribution and predicts deaths using a random forest algorithm trained on COVID-19 data and population-level characteristics, yielding daily projections and interval estimates for cases and deaths in U.S. states. We evaluated the model by training it on progressively longer periods of the pandemic and computing its predictive accuracy over 21-day forecasts. The substantial variation in predicted trajectories and associated uncertainty between states is illustrated by comparing three unique locations: New York, Colorado, and West Virginia. The sophistication and accuracy of this COVID-19 model offer reliable predictions and uncertainty estimates for the current trajectory of the pandemic in the U.S. and provide a platform for future predictions as shifting political and societal responses alter its course.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2021-03-29
    Description: Current dominant views hold that perceptual confidence reflects the probability that a decision is correct. Although these views have enjoyed some empirical support, recent behavioral results indicate that confidence and the probability of being correct can be dissociated. An alternative hypothesis suggests that confidence instead reflects the magnitude of evidence in favor of a decision while being relatively insensitive to the evidence opposing the decision. We considered how this alternative hypothesis might be biologically instantiated by developing a simple neural network model incorporating a known property of sensory neurons: tuned inhibition. The key idea of the model is that the level of inhibition that each accumulator unit receives from units with the opposite tuning preference, i.e. its inhibition ‘tuning’, dictates its contribution to perceptual decisions versus confidence judgments, such that units with higher tuned inhibition (computing relative evidence for different perceptual interpretations) determine perceptual discrimination decisions, and units with lower tuned inhibition (computing absolute evidence) determine confidence. We demonstrate that this biologically plausible model can account for several counterintuitive findings reported in the literature where confidence and decision accuracy dissociate. By comparing model fits, we further demonstrate that a full complement of behavioral data across several previously published experimental results—including accuracy, reaction time, mean confidence, and metacognitive sensitivity—is best accounted for when confidence is computed from units without, rather than units with, tuned inhibition. Finally, we discuss predictions of our results and model for future neurobiological studies. These findings suggest that the brain has developed and implements this alternative, heuristic theory of perceptual confidence computation by relying on the diversity of neural resources available.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2021-03-29
    Description: High-throughput B-cell sequencing has opened up new avenues for investigating complex mechanisms underlying our adaptive immune response. These technological advances drive data generation and the need to mine and analyze the information contained in these large datasets, in particular the identification of therapeutic antibodies (Abs) or those associated with disease exposure and protection. Here, we describe our efforts to use artificial intelligence (AI)-based image-analyses for prospective classification of Abs based solely on sequence information. We hypothesized that Abs recognizing the same part of an antigen share a limited set of features at the binding interface, and that the binding site regions of these Abs share share common structure and physicochemical property patterns that can serve as a “fingerprint” to recognize uncharacterized Abs. We combined large-scale sequence-based protein-structure predictions to generate ensembles of 3-D Ab models, reduced the Ab binding interface to a 2-D image (fingerprint), used pre-trained convolutional neural networks to extract features, and trained deep neural networks (DNNs) to classify Abs. We evaluated this approach using Ab sequences derived from human HIV and Ebola viral infections to differentiate between two Abs, Abs belonging to specific B-cell family lineages, and Abs with different epitope preferences. In addition, we explored a different type of DNN method to detect one class of Abs from a larger pool of Abs. Testing on Ab sets that had been kept aside during model training, we achieved average prediction accuracies ranging from 71–96% depending on the complexity of the classification task. The high level of accuracies reached during these classification tests suggests that the DNN models were able to learn a series of structural patterns shared by Abs belonging to the same class. The developed methodology provides a means to apply AI-based image recognition techniques to analyze high-throughput B-cell sequencing datasets (repertoires) for Ab classification.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2021-03-31
    Description: Summary VCF files with results of sequencing projects take a lot of space. We propose the VCFShark, which is able to compress VCF files up to an order of magnitude better than the de facto standards (gzipped VCF and BCF). The advantage over competitors is the greatest when compressing VCF files containing large amounts of genotype data. The processing speeds up to 100 MB/s and main memory requirements lower than 30 GB allow to use our tool at typical workstations even for large datasets. Availability and Implementation https://github.com/refresh-bio/vcfshark Supplementary information Supplementary data are available at publisher’s Web site.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
  • 40
    Publication Date: 2021-03-29
    Description: To better combat the expansion of antibiotic resistance in pathogens, new compounds, particularly those with novel mechanisms-of-action [MOA], represent a major research priority in biomedical science. However, rediscovery of known antibiotics demonstrates a need for approaches that accurately identify potential novelty with higher throughput and reduced labor. Here we describe an explainable artificial intelligence classification methodology that emphasizes prediction performance and human interpretability by using a Hierarchical Ensemble of Classifiers model optimized with a novel feature selection algorithm called Clairvoyance; collectively referred to as a CoHEC model. We evaluated our methods using whole transcriptome responses from Escherichia coli challenged with 41 FDA-approved antibiotics and 9 crude extracts while depositing 306 transcriptomes. Our CoHEC model can properly predict the primary MOA of previously unobserved compounds in both purified forms and crude extracts at an accuracy above 99%, while also correctly identifying darobactin, a newly discovered antibiotic, as having a novel MOA. In addition, we deploy our methods on a recent E. coli transcriptomics dataset in a different strain and a Mycobacterium smegmatis metabolomics timeseries dataset and showcase exceptionally high performance; improving upon the performance metrics of the original publications. We not only provide insight into the biological interpretation of our model but also that the concept of MOA is a non-discrete heuristic with diverse effects for different compounds within the same MOA, suggesting substantial antibiotic diversity awaiting discovery within existing MOA.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2021-03-29
    Description: The mechanisms underlying the emergence of seizures are one of the most important unresolved issues in epilepsy research. In this paper, we study how perturbations, exogenous or endogenous, may promote or delay seizure emergence. To this aim, due to the increasingly adopted view of epileptic dynamics in terms of slow-fast systems, we perform a theoretical analysis of the phase response of a generic relaxation oscillator. As relaxation oscillators are effectively bistable systems at the fast time scale, it is intuitive that perturbations of the non-seizing state with a suitable direction and amplitude may cause an immediate transition to seizure. By contrast, and perhaps less intuitively, smaller amplitude perturbations have been found to delay the spontaneous seizure initiation. By studying the isochrons of relaxation oscillators, we show that this is a generic phenomenon, with the size of such delay depending on the slow flow component. Therefore, depending on perturbation amplitudes, frequency and timing, a train of perturbations causes an occurrence increase, decrease or complete suppression of seizures. This dependence lends itself to analysis and mechanistic understanding through methods outlined in this paper. We illustrate this methodology by computing the isochrons, phase response curves and the response to perturbations in several epileptic models possessing different slow vector fields. While our theoretical results are applicable to any planar relaxation oscillator, in the motivating context of epilepsy they elucidate mechanisms of triggering and abating seizures, thus suggesting stimulation strategies with effects ranging from mere delaying to full suppression of seizures.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2021-03-29
    Description: Imaging Mass Cytometry (IMC) combines laser ablation and mass spectrometry to quantitate metal-conjugated primary antibodies incubated in intact tumor tissue slides. This strategy allows spatially-resolved multiplexing of dozens of simultaneous protein targets with 1μm resolution. Each slide is a spatial assay consisting of high-dimensional multivariate observations (m-dimensional feature space) collected at different spatial positions and capturing data from a single biological sample or even representative spots from multiple samples when using tissue microarrays. Often, each of these spatial assays could be characterized by several regions of interest (ROIs). To extract meaningful information from the multi-dimensional observations recorded at different ROIs across different assays, we propose to analyze such datasets using a two-step graph-based approach. We first construct for each ROI a graph representing the interactions between the m covariates and compute an m dimensional vector characterizing the steady state distribution among features. We then use all these m-dimensional vectors to construct a graph between the ROIs from all assays. This second graph is subjected to a nonlinear dimension reduction analysis, retrieving the intrinsic geometric representation of the ROIs. Such a representation provides the foundation for efficient and accurate organization of the different ROIs that correlates with their phenotypes. Theoretically, we show that when the ROIs have a particular bi-modal distribution, the new representation gives rise to a better distinction between the two modalities compared to the maximum a posteriori (MAP) estimator. We applied our method to predict the sensitivity to PD-1 axis blockers treatment of lung cancer subjects based on IMC data, achieving 97.3% average accuracy on two IMC datasets. This serves as empirical evidence that the graph of graphs approach enables us to integrate multiple ROIs and the intra-relationships between the features at each ROI, giving rise to an informative representation that is strongly associated with the phenotypic state of the entire image.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2021-03-28
    Description: Motivation As the generation of complex single-cell RNA sequencing datasets becomes more commonplace it is the responsibility of researchers to provide access to these data in a way that can be easily explored and shared. Whilst it is often the case that data is deposited for future bioinformatic analysis many studies do not release their data in a way that is easy to explore by non-computational researchers. Results In order to help address this we have developed ShinyCell, an R package that converts single-cell RNA sequencing datasets into explorable and shareable interactive interfaces. These interfaces can be easily customised in order to maximise their usability and can be easily uploaded to online platforms to facilitate wider access to published data. Availability ShinyCell is available at https://github.com/SGDDNB/ShinyCell and https://figshare.com/projects/ShinyCell/100439.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2021-03-28
    Description: Motivation Genomic selection (GS) is currently deemed the most effective approach to speed up breeding of agricultural varieties. It has been recognized that consideration of multiple traits in GS can improve accuracy of prediction for traits of low heritability. However, since GS forgoes statistical testing with the idea of improving predictions, it does not facilitate mechanistic understanding of the contribution of particular single nucleotide polymorphisms (SNP). Results Here we propose a L2,1-norm regularized multivariate regression model and devise a fast and efficient iterative optimization algorithm, called L2,1-joint, applicable in multi-trait GS. The usage of the L2,1-norm facilitates variable selection in a penalized multivariate regression that considers the relation between individuals, when the number of SNPs is much larger than the number of individuals. The capacity for variable selection allows us to define master regulators that can be used in a multi-trait GS setting to dissect the genetic architecture of the analyzed traits. Our comparative analyses demonstrate that the proposed model is a favorable candidate compared to existing state-of-the-art approaches. Prediction and variable selection with data sets from Brassica napus, wheat and Arabidopsis thaliana diversity panels are conducted to further showcase the performance of the proposed model. Availability and implementation The model is implemented using R programming language and the code is freely available from https://github.com/alainmbebi/L21-norm-GS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2021-03-28
    Description: Summary Finding informative predictive features in high dimensional biological case-control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use, and customize. Our package enhances the original algorithm by providing new features and customizability for data preparation, model training and classification functionalities. We believe the new features will enable the adoption of the algorithm for a diverse range of datasets. Availability The software package for Python is available online at https://github.com/roohy/eps
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2021-03-24
    Description: Slow-timescale (tonic) changes in dopamine (DA) contribute to a wide variety of processes in reinforcement learning, interval timing, and other domains. Furthermore, changes in tonic DA exert distinct effects depending on when they occur (e.g., during learning vs. performance) and what task the subject is performing (e.g., operant vs. classical conditioning). Two influential theories of tonic DA—the average reward theory and the Bayesian theory in which DA controls precision—have each been successful at explaining a subset of empirical findings. But how the same DA signal performs two seemingly distinct functions without creating crosstalk is not well understood. Here we reconcile the two theories under the unifying framework of ‘rational inattention,’ which (1) conceptually links average reward and precision, (2) outlines how DA manipulations affect this relationship, and in so doing, (3) captures new empirical phenomena. In brief, rational inattention asserts that agents can increase their precision in a task (and thus improve their performance) by paying a cognitive cost. Crucially, whether this cost is worth paying depends on average reward availability, reported by DA. The monotonic relationship between average reward and precision means that the DA signal contains the information necessary to retrieve the precision. When this information is needed after the task is performed, as presumed by Bayesian inference, acute manipulations of DA will bias behavior in predictable ways. We show how this framework reconciles a remarkably large collection of experimental findings. In reinforcement learning, the rational inattention framework predicts that learning from positive and negative feedback should be enhanced in high and low DA states, respectively, and that DA should tip the exploration-exploitation balance toward exploitation. In interval timing, this framework predicts that DA should increase the speed of the internal clock and decrease the extent of interference by other temporal stimuli during temporal reproduction (the central tendency effect). Finally, rational inattention makes the new predictions that these effects should be critically dependent on the controllability of rewards, that post-reward delays in intertemporal choice tasks should be underestimated, and that average reward manipulations should affect the speed of the clock—thus capturing empirical findings that are unexplained by either theory alone. Our results suggest that a common computational repertoire may underlie the seemingly heterogeneous roles of DA.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2021-03-22
    Description: Across the life sciences, processing next generation sequencing data commonly relies upon a computationally expensive process where reads are mapped onto a reference sequence. Prior to such processing, however, there is a vast amount of information that can be ascertained from the reads, potentially obviating the need for processing, or allowing optimized mapping approaches to be deployed. Here, we present a method termed FlexTyper which facilitates a “reverse mapping” approach in which high throughput sequence queries, in the form of k-mer searches, are run against indexed short-read datasets in order to extract useful information. This reverse mapping approach enables the rapid counting of target sequences of interest. We demonstrate FlexTyper’s utility for recovering depth of coverage, and accurate genotyping of SNP sites across the human genome. We show that genotyping unmapped reads can correctly inform a sample’s population, sex, and relatedness in a family setting. Detection of pathogen sequences within RNA-seq data was sensitive and accurate, performing comparably to existing methods, but with increased flexibility. We present two examples of ways in which this flexibility allows the analysis of genome features not well-represented in a linear reference. First, we analyze contigs from African genome sequencing studies, showing how they distribute across families from three distinct populations. Second, we show how gene-marking k-mers for the killer immune receptor locus allow allele detection in a region that is challenging for standard read mapping pipelines. The future adoption of the reverse mapping approach represented by FlexTyper will be enabled by more efficient methods for FM-index generation and biology-informed collections of reference queries. In the long-term, selection of population-specific references or weighting of edges in pan-population reference genome graphs will be possible using the FlexTyper approach. FlexTyper is available at https://github.com/wassermanlab/OpenFlexTyper.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2021-03-22
    Description: Our sense of touch helps us encounter the richness of our natural world. Across a myriad of contexts and repetitions, we have learned to deploy certain exploratory movements in order to elicit perceptual cues that are salient and efficient. The task of identifying optimal exploration strategies and somatosensory cues that underlie our softness perception remains relevant and incomplete. Leveraging psychophysical evaluations combined with computational finite element modeling of skin contact mechanics, we investigate an illusion phenomenon in exploring softness; where small-compliant and large-stiff spheres are indiscriminable. By modulating contact interactions at the finger pad, we find this elasticity-curvature illusion is observable in passive touch, when the finger is constrained to be stationary and only cutaneous responses from mechanosensitive afferents are perceptible. However, these spheres become readily discriminable when explored volitionally with musculoskeletal proprioception available. We subsequently exploit this phenomenon to dissociate relative contributions from cutaneous and proprioceptive signals in encoding our percept of material softness. Our findings shed light on how we volitionally explore soft objects, i.e., by controlling surface contact force to optimally elicit and integrate proprioceptive inputs amidst indiscriminable cutaneous contact cues. Moreover, in passive touch, e.g., for touch-enabled displays grounded to the finger, we find those spheres are discriminable when rates of change in cutaneous contact are varied between the stimuli, to supplant proprioceptive feedback.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2021-03-24
    Description: Motivation There are high demands for joint genotyping of structural variations with short-read sequencing, but efficient and accurate genotyping in population scale is a challenging task. Results We developed muCNV that aggregates per-sample summary pileups for joint genotyping of 〉 100,000 samples. Pilot results show very low Mendelian inconsistencies. Applications to large-scale projects in cloud show the computational efficiencies of muCNV genotyping pipeline. Availability muCNV is publicly available for download at: https://github.com/gjun/muCNV Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2021-03-22
    Description: Sesquiterpene synthases (STSs) catalyze the formation of a large class of plant volatiles called sesquiterpenes. While thousands of putative STS sequences from diverse plant species are available, only a small number of them have been functionally characterized. Sequence identity-based screening for desired enzymes, often used in biotechnological applications, is difficult to apply here as STS sequence similarity is strongly affected by species. This calls for more sophisticated computational methods for functionality prediction. We investigate the specificity of precursor cation formation in these elusive enzymes. By inspecting multi-product STSs, we demonstrate that STSs have a strong selectivity towards one precursor cation. We use a machine learning approach combining sequence and structure information to accurately predict precursor cation specificity for STSs across all plant species. We combine this with a co-evolutionary analysis on the wealth of uncharacterized putative STS sequences, to pinpoint residues and distant functional contacts influencing cation formation and reaction pathway selection. These structural factors can be used to predict and engineer enzymes with specific functions, as we demonstrate by predicting and characterizing two novel STSs from Citrus bergamia.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2021-03-26
    Description: Motivation Molecular property prediction is a hot topic in recent years. Existing graph-based models ignore the hierarchical structures of molecules. According to the knowledge of chemistry and pharmacy, the functional groups of molecules are closely related to its physio-chemical properties and binding affinities. So, it should be helpful to represent molecular graphs by fragments that contain functional groups for molecular property prediction. Results In this paper, to boost the performance of molecule property prediction, we first propose a definition of molecule graph fragments that may be or contain functional groups, which are relevant to molecular properties, then develop a fragment-oriented multi-scale graph attention network for molecular property prediction, which is called FraGAT. Experiments on several widely-used benchmarks are conducted to evaluate FraGAT. Experimental results show that FraGAT achieves state-of-the-art predictive performance in most cases. Furthermore, our case studies showthat when the fragments used to represent the molecule graphs contain functional groups, the model can make better predictions. This conforms to our expectation and demonstrates the interpretability of the proposed model. Availability and implementation The code and data underlying this work are available in GitHub, at https://github.com/ZiqiaoZhang/FraGAT. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2021-03-26
    Description: Motivation The Anatomical Therapeutic Chemical (ATC) system is an official classification system established by the World Health Organization for medicines. Correctly assigning ATC classes to given compounds is an important research problem in drug discovery, which can not only discover the possible active ingredients of the compounds, but also infer theirs therapeutic, pharmacological, and chemical properties. Results In this paper, we develop an end-to-end multi-label classifier called CGATCPred to predict 14 main ATC classes for given compounds. In order to extract rich features of each compound, we use the deep Convolutional Neural Network (CNN) and shortcut connections to represent and learn the seven association scores between the given compound and others. Moreover, we construct the correlation graph of ATC classes and then apply graph convolutional network (GCN) on the graph for label embedding abstraction. We use all label embedding to guide the learning process of compound representation. As a result, by using the Jackknife test, CGATCPred obtain reliable Aiming of 81.94%, Coverage of 82.88%, Accuracy 80.81%, Absolute True 76.58% and Absolute False 2.75%, yielding significantly improvements compared to exiting multi-label classifiers. Availability The codes of CGATCPred are available at https://github.com/zhc940702/CGATCPred and https://zenodo.org/record/4552917. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2021-03-24
    Description: Motivation Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences—for example, the binding site motifs of DNA-and RNA-binding proteins. Results The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME’s capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely-used MEME Suite of sequence analysis tools. The name STREME stands for “Simple, Thorough, Rapid, Enriched Motif Elicitation”. Availability The STREME web server and source code are provided freely for non-commercial use at http://meme-suite.org.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2021-03-24
    Description: Motivation Understanding the mechanisms by which the zebrafish pectoral fin develops is expected to produce insights on how vertebrate limbs grow from a 2D cell layer to a 3D structure. Two mechanisms have been proposed to drive limb morphogenesis in tetrapods: a growth-based morphogenesis with a higher proliferation rate at the distal tip of the limb bud than at the proximal side, and directed cell behaviors that include elongation, division and migration in a nonrandom manner. Based on quantitative experimental biological data at the level of individual cells in the whole developing organ, we test the conditions for the dynamics of pectoral fin early morphogenesis. Results We found that during the development of the zebrafish pectoral fin, cells have a preferential elongation axis that gradually aligns along the proximodistal axis (PD) of the organ. Based on these quantitative observations, we build a center-based cell model enhanced with a polarity term and cell proliferation to simulate fin growth. Our simulations resulted in 3D fins similar in shape to the observed ones, suggesting that the existence of a preferential axis of cell polarization is essential to drive fin morphogenesis in zebrafish, as observed in the development of limbs in the mouse, but distal tip-based expansion is not. Availability Upon publication, biological data will be available at http://bioemergences.eu/modelingFin, and source code at https://github.com/guijoe/MaSoFin. Supplementary information Supplementary data are included in this manuscript.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2021-03-22
    Description: The maintenance of synaptic changes resulting from long-term potentiation (LTP) is essential for brain function such as memory and learning. Different LTP phases have been associated with diverse molecular processes and pathways, and the molecular underpinnings of LTP on the short, as well as long time scales, are well established. However, the principles on the intermediate time scale of 1-6 hours that mediate the early phase of LTP (E-LTP) remain elusive. We hypothesize that the interplay between specific features of postsynaptic receptor trafficking is responsible for sustaining synaptic changes during this LTP phase. We test this hypothesis by formalizing a biophysical model that integrates several experimentally-motivated mechanisms. The model captures a wide range of experimental findings and predicts that synaptic changes are preserved for hours when the receptor dynamics are shaped by the interplay of structural changes of the spine in conjunction with increased trafficking from recycling endosomes and the cooperative binding of receptors. Furthermore, our model provides several predictions to verify our findings experimentally.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2021-03-24
    Description: Motivation Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. Results In this paper, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge. Availability https://github.com/JaesikKim/HiG2Vec Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2021-02-01
    Description: Nuclear Magnetic Resonance (NMR) spectroscopy is one of the three primary experimental means of characterizing macromolecular structures, including protein structures. Structure determination by solution NMR spectroscopy has traditionally relied heavily on distance restraints derived from nuclear Overhauser effect (NOE) measurements. While structure determination of proteins from NOE-based restraints is well understood and broadly used, structure determination from Residual Dipolar Couplings (RDCs) is relatively less well developed. Here, we describe the new features of the protein structure modeling program REDCRAFT and focus on the new Adaptive Decimation (AD) feature. The AD plays a critical role in improving the robustness of REDCRAFT to missing or noisy data, while allowing structure determination of larger proteins from less data. In this report we demonstrate the successful application of REDCRAFT in structure determination of proteins ranging in size from 50 to 145 residues using experimentally collected data, and of larger proteins (145 to 573 residues) using simulated RDC data. In both cases, REDCRAFT uses only RDC data that can be collected from perdeuterated proteins. Finally, we compare the accuracy of structure determination from RDCs alone with traditional NOE-based methods for the structurally novel PF.2048.1 protein. The RDC-based structure of PF.2048.1 exhibited 1.0 Å BB-RMSD with respect to a high-quality NOE-based structure. Although optimal strategies would include using RDC data together with chemical shift, NOE, and other NMR data, these studies provide proof-of-principle for robust structure determination of largely-perdeuterated proteins from RDC data alone using REDCRAFT.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2021-02-01
    Description: RNA is considered as an attractive target for new small molecule drugs. Designing active compounds can be facilitated by computational modeling. Most of the available tools developed for these prediction purposes, such as molecular docking or scoring functions, are parametrized for protein targets. The performance of these methods, when applied to RNA-ligand systems, is insufficient. To overcome these problems, we developed AnnapuRNA, a new knowledge-based scoring function designed to evaluate RNA-ligand complex structures, generated by any computational docking method. We also evaluated three main factors that may influence the structure prediction, i.e., the starting conformer of a ligand, the docking program, and the scoring function used. We applied the AnnapuRNA method for a post-hoc study of the recently published structures of the FMN riboswitch. Software is available at https://github.com/filipspl/AnnapuRNA.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2021-03-23
    Description: Motivation Facing the increasing gap between high-throughput sequence data and limited functional insights, computational protein function annotation provides a high-throughput alternative to experimental approaches. However, current methods can have limited applicability while relying on protein data besides sequences, or lack generalizability to novel sequences, species and functions. Results To overcome aforementioned barriers in applicability and generalizability, we propose a novel deep learning model using only sequence information for proteins, named Transformer-based protein function Annotation through joint sequence–Label Embedding (TALE). For generalizability to novel sequences we use self attention-based transformers to capture global patterns in sequences. For generalizability to unseen or rarely seen functions (tail labels), we embed protein function labels (hierarchical GO terms on directed graphs) together with inputs/features (1D sequences) in a joint latent space. Combining TALE and a sequence similarity-based method, TALE+ outperformed competing methods when only sequence input is available. It even outperformed a state-of-the-art method using network information besides sequence, in two of the three gene ontologies. Furthermore, TALE and TALE+ showed superior generalizability to proteins of low similarity, new species, or rarely annotated functions compared to training data, revealing deep insights into the protein sequence–function relationship. Ablation studies elucidated contributions of algorithmic components toward the accuracy and the generalizability. Availability The data, source codes and models are available at https://github.com/Shen-Lab/TALE Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2021-03-23
    Description: Motivation Random sampling of metabolic fluxes can provide a comprehensive description of the capabilities of a metabolic network. However, current sampling approaches do not model thermodynamics explicitly, leading to inaccurate predictions of an organism’s potential or actual metabolic operations. Results We present a probabilistic framework combining thermodynamic quantities with steady-state flux constraints to analyze the properties of a metabolic network. It includes methods for probabilistic metabolic optimization and for joint sampling of thermodynamic and flux spaces. Applied to a model of E. coli, we use the methods to reveal known and novel mechanisms of substrate channeling, and to accurately predict reaction directions and metabolite concentrations. Interestingly, predicted flux distributions are multimodal, leading to discrete hypotheses on E. coli’s metabolic capabilities. Availability Python and MATLAB packages available at https://gitlab.com/csb.ethz/pta. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2021-03-19
    Description: Adenosine receptors (ARs) have been demonstrated to be potential therapeutic targets against Parkinson’s disease (PD). In the present study, we describe a multistage virtual screening approach that identifies dual adenosine A1 and A2A receptor antagonists using deep learning, pharmacophore models, and molecular docking methods. Nineteen hits from the ChemDiv library containing 1,178,506 compounds were selected and further tested by in vitro assays (cAMP functional assay and radioligand binding assay); of these hits, two compounds (C8 and C9) with 1,2,4-triazole scaffolds possessing the most potent binding affinity and antagonistic activity for A1/A2A ARs at the nanomolar level (pKi of 7.16–7.49 and pIC50 of 6.31–6.78) were identified. Further molecular dynamics (MD) simulations suggested similarly strong binding interactions of the complexes between the A1/A2A ARs and two compounds (C8 and C9). Notably, the 1,2,4-triazole derivatives (compounds C8 and C9) were identified as the most potent dual A1/A2A AR antagonists in our study and could serve as a basis for further development. The effective multistage screening approach developed in this study can be utilized to identify potent ligands for other drug targets.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2021-03-26
    Description: Motivation Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less beneficial to directly integrate KGs with other smaller but higher quality data (e.g., experimental data). Most of existing approaches ignore KGs altogether. Some tries to directly integrate KGs with other data via graph neural networks with limited success. Furthermore most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is more meaningful but harder task. Results To fill the gaps, we propose a new method SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module that can efficiently anchor on relevant subgraphs from a KG, a self-attention based subgraph summarization scheme to generate reasoning path within the subgraph, and a multi-channel knowledge and data integration module that utilizes massive external biomedical knowledge for significantly improved multi-typed DDI predictions. SumGNN outperforms the best baseline by up to 5.54%, and performance gain is particularly significant in low data relation types. In addition, SumGNN provides interpretable prediction via the generated reasoning paths for each prediction. Availability The code is available in the supplementary. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2021-03-27
    Description: Motivation Most protein-structure superimposition tools consider only Cartesian coordinates. Yet, much of biology happens on the surface of proteins, which is why proteins with shared ancestry and similar function often have comparable surface shapes. Superposition of proteins based on surface shape can enable comparison of highly divergent proteins, identify convergent evolution and enable detailed comparison of surface features and binding sites. Results We present ZEAL, an interactive tool to superpose global and local protein structures based on their shape resemblance using 3D (Zernike-Canterakis) functions to represent the molecular surface. In a benchmark study of structures with the same fold, we show that ZEAL outperforms two other methods for shape-based superposition. In addition, alignments from ZEAL was of comparable quality to the coordinate-based superpositions provided by TM-align. For comparisons of proteins with limited sequence and backbone-fold similarity, where coordinate-based methods typically fail, ZEAL can often find alignments with substantial surface-shape correspondence. In combination with shape-based matching, ZEAL can be used as a general tool to study relationships between shape and protein function. We identify several categories of protein functions where global shape similarity is significantly more likely than expected by random chance, when comparing proteins with little similarity on the fold level. In particular, we find that global surface shape similarity is particular common among DNA binding proteins. Availability ZEAL can be used online at https://andrelab.org/zeal or as a standalone program with command line or graphical user interface. Source files and installers are available at https://github.com/Andre-lab/ZEAL Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2021-03-18
    Description: Extensive amounts of multi-omics data and multiple cancer subtyping methods have been developed rapidly, and generate discrepant clustering results, which poses challenges for cancer molecular subtype research. Thus, the development of methods for the identification of cancer consensus molecular subtypes is essential. The lack of intuitive and easy-to-use analytical tools has posed a barrier. Here, we report on the development of the COnsensus Molecular SUbtype of Cancer (COMSUC) web server. With COMSUC, users can explore consensus molecular subtypes of more than 30 cancers based on eight clustering methods, five types of omics data from public reference datasets or users’ private data, and three consensus clustering methods. The web server provides interactive and modifiable visualization, and publishable output of analysis results. Researchers can also exchange consensus subtype results with collaborators via project IDs. COMSUC is now publicly and freely available with no login requirement at http://comsuc.bioinforai.tech/ (IP address: http://59.110.25.27/). For a video summary of this web server, see S1 Video and S1 File.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2021-03-24
    Description: While vision evokes a dense network of feedforward and feedback neural processes in the brain, visual processes are primarily modeled with feedforward hierarchical neural networks, leaving the computational role of feedback processes poorly understood. Here, we developed a generative autoencoder neural network model and adversarially trained it on a categorically diverse data set of images. We hypothesized that the feedback processes in the ventral visual pathway can be represented by reconstruction of the visual information performed by the generative model. We compared representational similarity of the activity patterns in the proposed model with temporal (magnetoencephalography) and spatial (functional magnetic resonance imaging) visual brain responses. The proposed generative model identified two segregated neural dynamics in the visual brain. A temporal hierarchy of processes transforming low level visual information into high level semantics in the feedforward sweep, and a temporally later dynamics of inverse processes reconstructing low level visual information from a high level latent representation in the feedback sweep. Our results append to previous studies on neural feedback processes by presenting a new insight into the algorithmic function and the information carried by the feedback processes in the ventral visual pathway.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2021-03-17
    Description: Motivation For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. Results In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. Availability The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2021-03-18
    Description: Developing mathematical models to accurately predict microbial growth dynamics remains a key challenge in ecology, evolution, biotechnology, and public health. To reproduce and grow, microbes need to take up essential nutrients from the environment, and mathematical models classically assume that the nutrient uptake rate is a saturating function of the nutrient concentration. In nature, microbes experience different levels of nutrient availability at all environmental scales, yet parameters shaping the nutrient uptake function are commonly estimated for a single initial nutrient concentration. This hampers the models from accurately capturing microbial dynamics when the environmental conditions change. To address this problem, we conduct growth experiments for a range of micro-organisms, including human fungal pathogens, baker’s yeast, and common coliform bacteria, and uncover the following patterns. We observed that the maximal nutrient uptake rate and biomass yield were both decreasing functions of initial nutrient concentration. While a functional form for the relationship between biomass yield and initial nutrient concentration has been previously derived from first metabolic principles, here we also derive the form of the relationship between maximal nutrient uptake rate and initial nutrient concentration. Incorporating these two functions into a model of microbial growth allows for variable growth parameters and enables us to substantially improve predictions for microbial dynamics in a range of initial nutrient concentrations, compared to keeping growth parameters fixed.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2021-03-18
    Description: Spatial expansion of a population of cells can arise from growth of microorganisms, plant cells, and mammalian cells. It underlies normal or dysfunctional tissue development, and it can be exploited as the foundation for programming spatial patterns. This expansion is often driven by continuous growth and division of cells within a colony, which in turn pushes the peripheral cells outward. This process generates a repulsion velocity field at each location within the colony. Here we show that this process can be approximated as coarse-grained repulsive-expansion kinetics. This framework enables accurate and efficient simulation of growth and gene expression dynamics in radially symmetric colonies with homogenous z-directional distribution. It is robust even if cells are not spherical and vary in size. The simplicity of the resulting mathematical framework also greatly facilitates generation of mechanistic insights.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2021-03-18
    Description: Many initiatives have addressed the global need to upskill biologists in bioinformatics tools and techniques. Australia is not unique in its requirement for such training, but due to its large size and relatively small and geographically dispersed population, Australia faces specific challenges. A combined training approach was implemented by the authors to overcome these challenges. The “hybrid” method combines guidance from experienced trainers with the benefits of both webinar-style delivery and concurrent face-to-face hands-on practical exercises in classrooms. Since 2017, the hybrid method has been used to conduct 9 hands-on bioinformatics training sessions at international scale in which over 800 researchers have been trained in diverse topics on a range of software platforms. The method has become a key tool to ensure scalable and more equitable delivery of short-course bioinformatics training across Australia and can be easily adapted to other locations, topics, or settings.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2021-03-10
    Description: Human red blood cells (RBCs) have a circulatory lifespan of about four months. Under constant oxidative and mechanical stress, but devoid of organelles and deprived of biosynthetic capacity for protein renewal, RBCs undergo substantial homeostatic changes, progressive densification followed by late density reversal among others, changes assumed to have been harnessed by evolution to sustain the rheological competence of the RBCs for as long as possible. The unknown mechanisms by which this is achieved are the subject of this investigation. Each RBC traverses capillaries between 1000 and 2000 times per day, roughly one transit per minute. A dedicated Lifespan model of RBC homeostasis was developed as an extension of the RCM introduced in the previous paper to explore the cumulative patterns predicted for repetitive capillary transits over a standardized lifespan period of 120 days, using experimental data to constrain the range of acceptable model outcomes. Capillary transits were simulated by periods of elevated cell/medium volume ratios and by transient deformation-induced permeability changes attributed to PIEZO1 channel mediation as outlined in the previous paper. The first unexpected finding was that quantal density changes generated during single capillary transits cease accumulating after a few days and cannot account for the observed progressive densification of RBCs on their own, thus ruling out the quantal hypothesis. The second unexpected finding was that the documented patterns of RBC densification and late reversal could only be emulated by the implementation of a strict time-course of decay in the activities of the calcium and Na/K pumps, suggestive of a selective mechanism enabling the extended longevity of RBCs. The densification pattern over most of the circulatory lifespan was determined by calcium pump decay whereas late density reversal was shaped by the pattern of Na/K pump decay. A third finding was that both quantal changes and pump-decay regimes were necessary to account for the documented lifespan pattern, neither sufficient on their own. A fourth new finding revealed that RBCs exposed to levels of PIEZO1-medited calcium permeation above certain thresholds in the circulation could develop a pattern of early or late hyperdense collapse followed by delayed density reversal. When tested over much reduced lifespan periods the results reproduced the known circulatory fate of irreversible sickle cells, the cell subpopulation responsible for vaso-occlusion and for most of the clinical manifestations of sickle cell disease. Analysis of the results provided an insightful new understanding of the mechanisms driving the changes in RBC homeostasis during circulatory aging in health and disease.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2021-03-10
    Description: Cell migration in 3D microenvironments is a complex process which depends on the coordinated activity of leading edge protrusive force and rear retraction in a push-pull mechanism. While the potentiation of protrusions has been widely studied, the precise signalling and mechanical events that lead to retraction of the cell rear are much less well understood, particularly in physiological 3D extra-cellular matrix (ECM). We previously discovered that rear retraction in fast moving cells is a highly dynamic process involving the precise spatiotemporal interplay of mechanosensing by caveolae and signalling through RhoA. To further interrogate the dynamics of rear retraction, we have adopted three distinct mathematical modelling approaches here based on (i) Boolean logic, (ii) deterministic kinetic ordinary differential equations (ODEs) and (iii) stochastic simulations. The aims of this multi-faceted approach are twofold: firstly to derive new biological insight into cell rear dynamics via generation of testable hypotheses and predictions; and secondly to compare and contrast the distinct modelling approaches when used to describe the same, relatively under-studied system. Overall, our modelling approaches complement each other, suggesting that such a multi-faceted approach is more informative than methods based on a single modelling technique to interrogate biological systems. Whilst Boolean logic was not able to fully recapitulate the complexity of rear retraction signalling, an ODE model could make plausible population level predictions. Stochastic simulations added a further level of complexity by accurately mimicking previous experimental findings and acting as a single cell simulator. Our approach highlighted the unanticipated role for CDK1 in rear retraction, a prediction we confirmed experimentally. Moreover, our models led to a novel prediction regarding the potential existence of a ‘set point’ in local stiffness gradients that promotes polarisation and rapid rear retraction.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2021-03-18
    Description: Emerging research shows that circular RNA (circRNA) plays a crucial role in the diagnosis, occurrence and prognosis of complex human diseases. Compared with traditional biological experiments, the computational method of fusing multi-source biological data to identify the association between circRNA and disease can effectively reduce cost and save time. Considering the limitations of existing computational models, we propose a semi-supervised generative adversarial network (GAN) model SGANRDA for predicting circRNA–disease association. This model first fused the natural language features of the circRNA sequence and the features of disease semantics, circRNA and disease Gaussian interaction profile kernel, and then used all circRNA–disease pairs to pre-train the GAN network, and fine-tune the network parameters through labeled samples. Finally, the extreme learning machine classifier is employed to obtain the prediction result. Compared with the previous supervision model, SGANRDA innovatively introduced circRNA sequences and utilized all the information of circRNA–disease pairs during the pre-training process. This step can increase the information content of the feature to some extent and reduce the impact of too few known associations on the model performance. SGANRDA obtained AUC scores of 0.9411 and 0.9223 in leave-one-out cross-validation and 5-fold cross-validation, respectively. Prediction results on the benchmark dataset show that SGANRDA outperforms other existing models. In addition, 25 of the top 30 circRNA–disease pairs with the highest scores of SGANRDA in case studies were verified by recent literature. These experimental results demonstrate that SGANRDA is a useful model to predict the circRNA–disease association and can provide reliable candidates for biological experiments.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2021-03-02
    Description: In the last two decades rodents have been on the rise as a dominant model for visual neuroscience. This is particularly true for earlier levels of information processing, but a number of studies have suggested that also higher levels of processing such as invariant object recognition occur in rodents. Here we provide a quantitative and comprehensive assessment of this claim by comparing a wide range of rodent behavioral and neural data with convolutional deep neural networks. These networks have been shown to capture hallmark properties of information processing in primates through a succession of convolutional and fully connected layers. We find that performance on rodent object vision tasks can be captured using low to mid-level convolutional layers only, without any convincing evidence for the need of higher layers known to simulate complex object recognition in primates. Our approach also reveals surprising insights on assumptions made before, for example, that the best performing animals would be the ones using the most abstract representations–which we show to likely be incorrect. Our findings suggest a road ahead for further studies aiming at quantifying and establishing the richness of representations underlying information processing in animal models at large.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2021-03-02
    Description: Microbes can metabolize more chemical compounds than any other group of organisms. As a result, their metabolism is of interest to investigators across biology. Despite the interest, information on metabolism of specific microbes is hard to access. Information is buried in text of books and journals, and investigators have no easy way to extract it out. Here we investigate if neural networks can extract out this information and predict metabolic traits. For proof of concept, we predicted two traits: whether microbes carry one type of metabolism (fermentation) or produce one metabolite (acetate). We collected written descriptions of 7,021 species of bacteria and archaea from Bergey’s Manual. We read the descriptions and manually identified (labeled) which species were fermentative or produced acetate. We then trained neural networks to predict these labels. In total, we identified 2,364 species as fermentative, and 1,009 species as also producing acetate. Neural networks could predict which species were fermentative with 97.3% accuracy. Accuracy was even higher (98.6%) when predicting species also producing acetate. Phylogenetic trees of species and their traits confirmed that predictions were accurate. Our approach with neural networks can extract information efficiently and accurately. It paves the way for putting more metabolic traits into databases, providing easy access of information to investigators.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2021-03-12
    Description: PDkit is an open source software toolkit supporting the collaborative development of novel methods of digital assessment for Parkinson’s Disease, using symptom measurements captured continuously by wearables (passive monitoring) or by high-use-frequency smartphone apps (active monitoring). The goal of the toolkit is to help address the current lack of algorithmic and model transparency in this area by facilitating open sharing of standardised methods that allow the comparison of results across multiple centres and hardware variations. PDkit adopts the information-processing pipeline abstraction incorporating stages for data ingestion, quality of information augmentation, feature extraction, biomarker estimation and finally, scoring using standard clinical scales. Additionally, a dataflow programming framework is provided to support high performance computations. The practical use of PDkit is demonstrated in the context of the CUSSP clinical trial in the UK. The toolkit is implemented in the python programming language, the de facto standard for modern data science applications, and is widely available under the MIT license.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2021-02-17
    Description: Computational models of animal biosonar seek to identify critical aspects of echo processing responsible for the superior, real-time performance of echolocating bats and dolphins in target tracking and clutter rejection. The Spectrogram Correlation and Transformation (SCAT) model replicates aspects of biosonar imaging in both species by processing wideband biosonar sounds and echoes with auditory mechanisms identified from experiments with bats. The model acquires broadband biosonar broadcasts and echoes, represents them as time-frequency spectrograms using parallel bandpass filters, translates the filtered signals into ten parallel amplitude threshold levels, and then operates on the resulting time-of-occurrence values at each frequency to estimate overall echo range delay. It uses the structure of the echo spectrum by depicting it as a series of local frequency nulls arranged regularly along the frequency axis of the spectrograms after dechirping them relative to the broadcast. Computations take place entirely on the timing of threshold-crossing events for each echo relative to threshold-events for the broadcast. Threshold-crossing times take into account amplitude-latency trading, a physiological feature absent from conventional digital signal processing. Amplitude-latency trading transposes the profile of amplitudes across frequencies into a profile of time-registrations across frequencies. Target shape is extracted from the spacing of the object’s individual acoustic reflecting points, or glints, using the mutual interference pattern of peaks and nulls in the echo spectrum. These are merged with the overall range-delay estimate to produce a delay-based reconstruction of the object’s distance as well as its glints. Clutter echoes indiscriminately activate multiple parts in the null-detecting system, which then produces the equivalent glint-delay spacings in images, thus blurring the overall echo-delay estimates by adding spurious glint delays to the image. Blurring acts as an anticorrelation process that rejects clutter intrusion into perceptions.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2021-02-17
    Description: Lung cancer is one of the leading causes of cancer-related deaths worldwide and is characterized by hijacking immune system for active growth and aggressive metastasis. Neutrophils, which in their original form should establish immune activities to the tumor as a first line of defense, are undermined by tumor cells to promote tumor invasion in several ways. In this study, we investigate the mutual interactions between the tumor cells and the neutrophils that facilitate tumor invasion by developing a mathematical model that involves taxis-reaction-diffusion equations for the critical components in the interaction. These include the densities of tumor and neutrophils, and the concentrations of signaling molecules and structure such as neutrophil extracellular traps (NETs). We apply the mathematical model to a Boyden invasion assay used in the experiments to demonstrate that the tumor-associated neutrophils can enhance tumor cell invasion by secreting the neutrophil elastase. We show that the model can both reproduce the major experimental observation on NET-mediated cancer invasion and make several important predictions to guide future experiments with the goal of the development of new anti-tumor strategies. Moreover, using this model, we investigate the fundamental mechanism of NET-mediated invasion of cancer cells and the impact of internal and external heterogeneity on the migration patterning of tumour cells and their response to different treatment schedules.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2021-02-02
    Description: Finding non-standard or new metabolic pathways has important applications in metabolic engineering, synthetic biology and the analysis and reconstruction of metabolic networks. Branched metabolic pathways dominate in metabolic networks and depict a more comprehensive picture of metabolism compared to linear pathways. Although progress has been developed to find branched metabolic pathways, few efforts have been made in identifying branched metabolic pathways via atom group tracking. In this paper, we present a pathfinding method called BPFinder for finding branched metabolic pathways by atom group tracking, which aims to guide the synthetic design of metabolic pathways. BPFinder enumerates linear metabolic pathways by tracking the movements of atom groups in metabolic network and merges the linear atom group conserving pathways into branched pathways. Two merging rules based on the structure of conserved atom groups are proposed to accurately merge the branched compounds of linear pathways to identify branched pathways. Furthermore, the integrated information of compound similarity, thermodynamic feasibility and conserved atom groups is also used to rank the pathfinding results for feasible branched pathways. Experimental results show that BPFinder is more capable of recovering known branched metabolic pathways as compared to other existing methods, and is able to return biologically relevant branched pathways and discover alternative branched pathways of biochemical interest. The online server of BPFinder is available at http://114.215.129.245:8080/atomic/. The program, source code and data can be downloaded from https://github.com/hyr0771/BPFinder.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2021-03-10
    Description: Pacemaking dysfunction (PD) may result in heart rhythm disorders, syncope or even death. Current treatment of PD using implanted electronic pacemakers has some limitations, such as finite battery life and the risk of repeated surgery. As such, the biological pacemaker has been proposed as a potential alternative to the electronic pacemaker for PD treatment. Experimentally and computationally, it has been shown that bio-engineered pacemaker cells can be generated from non-rhythmic ventricular myocytes (VMs) by knocking out genes related to the inward rectifier potassium channel current (IK1) or by overexpressing hyperpolarization-activated cyclic nucleotide gated channel genes responsible for the “funny” current (If). However, it is unclear if a bio-engineered pacemaker based on the modification of IK1- and If-related channels simultaneously would enhance the ability and stability of bio-engineered pacemaking action potentials. In this study, the possible mechanism(s) responsible for VMs to generate spontaneous pacemaking activity by regulating IK1 and If density were investigated by a computational approach. Our results showed that there was a reciprocal interaction between IK1 and If in ventricular pacemaker model. The effect of IK1 depression on generating ventricular pacemaker was mono-phasic while that of If augmentation was bi-phasic. A moderate increase of If promoted pacemaking activity but excessive increase of If resulted in a slowdown in the pacemaking rate and even an unstable pacemaking state. The dedicated interplay between IK1 and If in generating stable pacemaking and dysrhythmias was evaluated. Finally, a theoretical analysis in the IK1/If parameter space for generating pacemaking action potentials in different states was provided. In conclusion, to the best of our knowledge, this study provides a wide theoretical insight into understandings for generating stable and robust pacemaker cells from non-pacemaking VMs by the interplay of IK1 and If, which may be helpful in designing engineered biological pacemakers for application purposes.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2021-03-18
    Description: Disease epidemic outbreaks on human metapopulation networks are often driven by a small number of superspreader nodes, which are primarily responsible for spreading the disease throughout the network. Superspreader nodes typically are characterized either by their locations within the network, by their degree of connectivity and centrality, or by their habitat suitability for the disease, described by their reproduction number (R). Here we introduce a model that considers simultaneously the effects of network properties and R on superspreaders, as opposed to previous research which considered each factor separately. This type of model is applicable to diseases for which habitat suitability varies by climate or land cover, and for direct transmitted diseases for which population density and mitigation practices influences R. We present analytical models that quantify the superspreader capacity of a population node by two measures: probability-dependent superspreader capacity, the expected number of neighboring nodes to which the node in consideration will randomly spread the disease per epidemic generation, and time-dependent superspreader capacity, the rate at which the node spreads the disease to each of its neighbors. We validate our analytical models with a Monte Carlo analysis of repeated stochastic Susceptible-Infected-Recovered (SIR) simulations on randomly generated human population networks, and we use a random forest statistical model to relate superspreader risk to connectivity, R, centrality, clustering, and diffusion. We demonstrate that either degree of connectivity or R above a certain threshold are sufficient conditions for a node to have a moderate superspreader risk factor, but both are necessary for a node to have a high-risk factor. The statistical model presented in this article can be used to predict the location of superspreader events in future epidemics, and to predict the effectiveness of mitigation strategies that seek to reduce the value of R, alter host movements, or both.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2021-03-18
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2021-03-18
    Description: A systematic and reproducible “workflow”—the process that moves a scientific investigation from raw data to coherent research question to insightful contribution—should be a fundamental part of academic data-intensive research practice. In this paper, we elaborate basic principles of a reproducible data analysis workflow by defining 3 phases: the Explore, Refine, and Produce Phases. Each phase is roughly centered around the audience to whom research decisions, methodologies, and results are being immediately communicated. Importantly, each phase can also give rise to a number of research products beyond traditional academic publications. Where relevant, we draw analogies between design principles and established practice in software development. The guidance provided here is not intended to be a strict rulebook; rather, the suggestions for practices and tools to advance reproducible, sound data-intensive analysis may furnish support for both students new to research and current researchers who are new to data-intensive work.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2021-03-15
    Description: Motivation Ribosome Profiling (Ribo-seq) has revolutionized the study of RNA translation by providing information on ribosome positions across all translated RNAs with nucleotide-resolution. Yet several technical limitations restrict the sequencing depth of such experiments, the most common of which is the overabundance of rRNA fragments. Various strategies can be employed to tackle this issue, including the use of commercial rRNA depletion kits. However, as they are designed for more standardized RNAseq experiments, they may perform suboptimally in Ribo-seq. In order to overcome this, it is possible to use custom biotinylated oligos complementary to the most abundant rRNA fragments, however currently no computational framework exists to aid the design of optimal oligos. Results Here, we first show that a major confounding issue is that the rRNA fragments generated via Ribo-seq vary significantly with differing experimental conditions, suggesting that a “one-size-fits-all” approach may be inefficient. Therefore we developed Ribo-ODDR, an oligo design pipeline integrated with a user-friendly interface that assists in oligo selection for efficient experiment-specific rRNA depletion. Ribo-ODDR uses preliminary data to identify the most abundant rRNA fragments, and calculates the rRNA depletion efficiency of potential oligos. We experimentally show that Ribo-ODDR designed oligos outperform commercially available kits and lead to a significant increase in rRNA depletion in Ribo-seq. Availability Ribo-ODDR is freely accessible at https://github.com/fallerlab/Ribo-ODDR Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2021-03-16
    Print ISSN: 0302-3427
    Electronic ISSN: 1471-5430
    Topics: Nature of Science, Research, Systems of Higher Education, Museum Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2021-03-15
    Description: Summary Many experimental approaches have been developed to identify transcription start sites (TSS) from genomic scale data. However, experiment specific biases lead to large numbers of false-positive calls. Here, we present our integrative approach iTiSS, which is an accurate and generic TSS caller for any TSS profiling experiment in eukaryotes, and substantially reduces the number of false positives by a joint analysis of several complementary datasets. Availability and implementation iTiSS is platform independent and implemented in Java (v1.8) and is freely available at https://www.erhard-lab.de/software and https://github.com/erhard-lab/iTiSS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2021-03-01
    Description: Human pluripotent stem cells hold significant promise for regenerative medicine. However, long differentiation protocols and immature characteristics of stem cell-derived cell types remain challenges to the development of many therapeutic applications. In contrast to the slow differentiation of human stem cells in vitro that mirrors a nine-month gestation period, mouse stem cells develop according to a much faster three-week gestation timeline. Here, we tested if co-differentiation with mouse pluripotent stem cells could accelerate the differentiation speed of human embryonic stem cells. Following a six-week RNA-sequencing time course of neural differentiation, we identified 929 human genes that were upregulated earlier and 535 genes that exhibited earlier peaked expression profiles in chimeric cell cultures than in human cell cultures alone. Genes with accelerated upregulation were significantly enriched in Gene Ontology terms associated with neurogenesis, neuron differentiation and maturation, and synapse signaling. Moreover, chimeric mixed samples correlated with in utero human embryonic samples earlier than human cells alone, and acceleration was dose-dependent on human-mouse co-culture ratios. The altered gene expression patterns and developmental rates described in this report have implications for accelerating human stem cell differentiation and the use of interspecies chimeric embryos in developing human organs for transplantation.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2021-03-15
    Description: Summary Once folded, natural protein molecules have few energetic conflicts within their polypeptide chains. Many protein structures do however contain regions where energetic conflicts remain after folding, i.e. they are highly frustrated. These regions, kept in place over evolutionary and physiological timescales, are related to several functional aspects of natural proteins such as protein–protein interactions, small ligand recognition, catalytic sites and allostery. Here, we present FrustratometeR, an R package that easily computes local energetic frustration on a personal computer or a cluster. This package facilitates large scale analysis of local frustration, point mutants and molecular dynamics (MD) trajectories, allowing straightforward integration of local frustration analysis into pipelines for protein structural analysis. Availability and implementation https://github.com/proteinphysiologylab/frustratometeR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2021-03-25
    Description: The activity of a border ownership selective (BOS) neuron indicates where a foreground object is located relative to its (classical) receptive field (RF). A population of BOS neurons thus provides an important component of perceptual grouping, the organization of the visual scene into objects. In previous theoretical work, it has been suggested that this grouping mechanism is implemented by a population of dedicated grouping (“G”) cells that integrate the activity of the distributed feature cells representing an object and, by feedback, modulate the same cells, thus making them border ownership selective. The feedback modulation by G cells is thought to also provide the mechanism for object-based attention. A recent modeling study showed that modulatory common feedback, implemented by synapses with N-methyl-D-aspartate (NMDA)-type glutamate receptors, accounts for the experimentally observed synchrony in spike trains of BOS neurons and the shape of cross-correlations between them, including its dependence on the attentional state. However, that study was limited to pairs of BOS neurons with consistent border ownership preferences, defined as two neurons tuned to respond to the same visual object, in which attention decreases synchrony. But attention has also been shown to increase synchrony in neurons with inconsistent border ownership selectivity. Here we extend the computational model from the previous study to fully understand these effects of attention. We postulate the existence of a second type of G-cell that represents spatial attention by modulating the activity of all BOS cells in a spatially defined area. Simulations of this model show that a combination of spatial and object-based mechanisms fully accounts for the observed pattern of synchrony between BOS neurons. Our results suggest that modulatory feedback from G-cells may underlie both spatial and object-based attention.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2021-03-25
    Description: The sequences of antibodies from a given repertoire are highly diverse at few sites located on the surface of a genome-encoded larger scaffold. The scaffold is often considered to play a lesser role than highly diverse, non-genome-encoded sites in controlling binding affinity and specificity. To gauge the impact of the scaffold, we carried out quantitative phage display experiments where we compare the response to selection for binding to four different targets of three different antibody libraries based on distinct scaffolds but harboring the same diversity at randomized sites. We first show that the response to selection of an antibody library may be captured by two measurable parameters. Second, we provide evidence that one of these parameters is determined by the degree of affinity maturation of the scaffold, affinity maturation being the process by which antibodies accumulate somatic mutations to evolve towards higher affinities during the natural immune response. In all cases, we find that libraries of antibodies built around maturated scaffolds have a lower response to selection to other arbitrary targets than libraries built around germline-based scaffolds. We thus propose that germline-encoded scaffolds have a higher selective potential than maturated ones as a consequence of a selection for this potential over the long-term evolution of germline antibody genes. Our results are a first step towards quantifying the evolutionary potential of biomolecules.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2021-02-16
    Description: Evolutionary branching occurs when a population with a unimodal phenotype distribution diversifies into a multimodally distributed population consisting of two or more strains. Branching results from frequency-dependent selection, which is caused by interactions between individuals. For example, a population performing a social task may diversify into a cooperator strain and a defector strain. Branching can also occur in multi-dimensional phenotype spaces, such as when two tasks are performed simultaneously. In such cases, the strains may diverge in different directions: possible outcomes include division of labor (with each population performing one of the tasks) or the diversification into a strain that performs both tasks and another that performs neither. Here we show that the shape of the population’s phenotypic distribution plays a role in determining the direction of branching. Furthermore, we show that the shape of the distribution is, in turn, contingent on the direction of approach to the evolutionary branching point. This results in a distribution–selection feedback that is not captured in analytical models of evolutionary branching, which assume monomorphic populations. Finally, we show that this feedback can influence long-term evolutionary dynamics and promote the evolution of division of labor.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2021-02-16
    Description: Contemporary accounts of the initiation of cardiac arrhythmias typically rely on after-depolarizations as the trigger for reentrant activity. The after-depolarizations are usually triggered by calcium entry or spontaneous release within the cells of the myocardium or the conduction system. Here we propose an alternative mechanism whereby arrhythmias are triggered autonomously by cardiac cells that fail to repolarize after a normal heartbeat. We investigated the proposal by representing the heart as an excitable medium of FitzHugh-Nagumo cells where a proportion of cells were capable of remaining depolarized indefinitely. As such, those cells exhibit bistable membrane dynamics. We found that heterogeneous media can tolerate a surprisingly large number of bistable cells and still support normal rhythmic activity. Yet there is a critical limit beyond which the medium is persistently arrhythmogenic. Numerical analysis revealed that the critical threshold for arrhythmogenesis depends on both the strength of the coupling between cells and the extent to which the abnormal cells resist repolarization. Moreover, arrhythmogenesis was found to emerge preferentially at tissue boundaries where cells naturally have fewer neighbors to influence their behavior. These findings may explain why atrial fibrillation typically originates from tissue boundaries such as the cuff of the pulmonary vein.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2021-02-16
    Description: Spectral similarity is used as a proxy for structural similarity in many tandem mass spectrometry (MS/MS) based metabolomics analyses such as library matching and molecular networking. Although weaknesses in the relationship between spectral similarity scores and the true structural similarities have been described, little development of alternative scores has been undertaken. Here, we introduce Spec2Vec, a novel spectral similarity score inspired by a natural language processing algorithm—Word2Vec. Spec2Vec learns fragmental relationships within a large set of spectral data to derive abstract spectral embeddings that can be used to assess spectral similarities. Using data derived from GNPS MS/MS libraries including spectra for nearly 13,000 unique molecules, we show how Spec2Vec scores correlate better with structural similarity than cosine-based scores. We demonstrate the advantages of Spec2Vec in library matching and molecular networking. Spec2Vec is computationally more scalable allowing structural analogue searches in large databases within seconds.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2021-02-19
    Description: Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2021-02-17
    Description: Surgical interventions in epileptic patients aimed at the removal of the epileptogenic zone have success rates at only 60-70%. This failure can be partly attributed to the insufficient spatial sampling by the implanted intracranial electrodes during the clinical evaluation, leading to an incomplete picture of spatio-temporal seizure organization in the regions that are not directly observed. Utilizing the partial observations of the seizure spreading through the brain network, complemented by the assumption that the epileptic seizures spread along the structural connections, we infer if and when are the unobserved regions recruited in the seizure. To this end we introduce a data-driven model of seizure recruitment and propagation across a weighted network, which we invert using the Bayesian inference framework. Using a leave-one-out cross-validation scheme on a cohort of 45 patients we demonstrate that the method can improve the predictions of the states of the unobserved regions compared to an empirical estimate that does not use the structural information, yet it is on the same level as the estimate that takes the structure into account. Furthermore, a comparison with the performed surgical resection and the surgery outcome indicates a link between the inferred excitable regions and the actual epileptogenic zone. The results emphasize the importance of the structural connectome in the large-scale spatio-temporal organization of epileptic seizures and introduce a novel way to integrate the patient-specific connectome and intracranial seizure recordings in a whole-brain computational model of seizure spread.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2021-02-02
    Description: A population’s spatial structure affects the rate of genetic change and the outcome of natural selection. These effects can be modeled mathematically using the Birth-death process on graphs. Individuals occupy the vertices of a weighted graph, and reproduce into neighboring vertices based on fitness. A key quantity is the probability that a mutant type will sweep to fixation, as a function of the mutant’s fitness. Graphs that increase the fixation probability of beneficial mutations, and decrease that of deleterious mutations, are said to amplify selection. However, fixation probabilities are difficult to compute for an arbitrary graph. Here we derive an expression for the fixation probability, of a weakly-selected mutation, in terms of the time for two lineages to coalesce. This expression enables weak-selection fixation probabilities to be computed, for an arbitrary weighted graph, in polynomial time. Applying this method, we explore the range of possible effects of graph structure on natural selection, genetic drift, and the balance between the two. Using exhaustive analysis of small graphs and a genetic search algorithm, we identify families of graphs with striking effects on fixation probability, and we analyze these families mathematically. Our work reveals the nuanced effects of graph structure on natural selection and neutral drift. In particular, we show how these notions depend critically on the process by which mutations arise.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2021-03-25
    Description: Adaptive immune system uses T cell receptors (TCRs) to recognize pathogens and to consequently initiate immune responses. TCRs can be sequenced from individuals and methods analyzing the specificity of the TCRs can help us better understand individuals’ immune status in different disorders. For this task, we have developed TCRGP, a novel Gaussian process method that predicts if TCRs recognize specified epitopes. TCRGP can utilize the amino acid sequences of the complementarity determining regions (CDRs) from TCRα and TCRβ chains and learn which CDRs are important in recognizing different epitopes. Our comprehensive evaluation with epitope-specific TCR sequencing data shows that TCRGP achieves on average higher prediction accuracy in terms of AUROC score than existing state-of-the-art methods in epitope-specificity predictions. We also propose a novel analysis approach for combined single-cell RNA and TCRαβ (scRNA+TCRαβ) sequencing data by quantifying epitope-specific TCRs with TCRGP and identify HBV-epitope specific T cells and their transcriptomic states in hepatocellular carcinoma patients.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2021-03-26
    Description: Simple choices (e.g., eating an apple vs. an orange) are made by integrating noisy evidence that is sampled over time and influenced by visual attention; as a result, fluctuations in visual attention can affect choices. But what determines what is fixated and when? To address this question, we model the decision process for simple choice as an information sampling problem, and approximate the optimal sampling policy. We find that it is optimal to sample from options whose value estimates are both high and uncertain. Furthermore, the optimal policy provides a reasonable account of fixations and choices in binary and trinary simple choice, as well as the differences between the two cases. Overall, the results show that the fixation process during simple choice is influenced dynamically by the value estimates computed during the decision process, in a manner consistent with optimal information sampling.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2021-03-25
    Description: Sequential behaviour is often compositional and organised across multiple time scales: a set of individual elements developing on short time scales (motifs) are combined to form longer functional sequences (syntax). Such organisation leads to a natural hierarchy that can be used advantageously for learning, since the motifs and the syntax can be acquired independently. Despite mounting experimental evidence for hierarchical structures in neuroscience, models for temporal learning based on neuronal networks have mostly focused on serial methods. Here, we introduce a network model of spiking neurons with a hierarchical organisation aimed at sequence learning on multiple time scales. Using biophysically motivated neuron dynamics and local plasticity rules, the model can learn motifs and syntax independently. Furthermore, the model can relearn sequences efficiently and store multiple sequences. Compared to serial learning, the hierarchical model displays faster learning, more flexible relearning, increased capacity, and higher robustness to perturbations. The hierarchical model redistributes the variability: it achieves high motif fidelity at the cost of higher variability in the between-motif timings.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2021-03-26
    Description: Understanding CRISPR-Cas systems—the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks—is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISPR-Cas system learns from previous viral infections and integrates small pieces from phage genomes called spacers into the microbial genome. The resulting library of spacers collected in CRISPR arrays is then compared with the DNA of potential invaders. One of the most intriguing and least well understood questions about CRISPR-Cas systems is the distribution of spacers across the microbial population. Here, using empirical data, we show that the global distribution of spacer numbers in CRISPR arrays across multiple biomes worldwide typically exhibits scale-invariant power law behaviour, and the standard deviation is greater than the sample mean. We develop a mathematical model of spacer loss and acquisition dynamics which fits observed data from almost four thousand metagenomes well. In analogy to the classical ‘rich-get-richer’ mechanism of power law emergence, the rate of spacer acquisition is proportional to the CRISPR array size, which allows a small proportion of CRISPRs within the population to possess a significant number of spacers. Our study provides an alternative explanation for the rarity of all-resistant super microbes in nature and why proliferation of phages can be highly successful despite the effectiveness of CRISPR-Cas systems.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2021-03-26
    Description: Translation elongation is regulated by a series of complicated mechanisms in both prokaryotes and eukaryotes. Although recent advance in ribosome profiling techniques has enabled one to capture the genome-wide ribosome footprints along transcripts at codon resolution, the regulatory codes of elongation dynamics are still not fully understood. Most of the existing computational approaches for modeling translation elongation from ribosome profiling data mainly focus on local contextual patterns, while ignoring the continuity of the elongation process and relations between ribosome densities of remote codons. Modeling the translation elongation process in full-length coding sequence (CDS) level has not been studied to the best of our knowledge. In this paper, we developed a deep learning based approach with a multi-input and multi-output framework, named RiboMIMO, for modeling the ribosome density distributions of full-length mRNA CDS regions. Through considering the underlying correlations in translation efficiency among neighboring and remote codons and extracting hidden features from the input full-length coding sequence, RiboMIMO can greatly outperform the state-of-the-art baseline approaches and accurately predict the ribosome density distributions along the whole mRNA CDS regions. In addition, RiboMIMO explores the contributions of individual input codons to the predictions of output ribosome densities, which thus can help reveal important biological factors influencing the translation elongation process. The analyses, based on our interpretable metric named codon impact score, not only identified several patterns consistent with the previously-published literatures, but also for the first time (to the best of our knowledge) revealed that the codons located at a long distance from the ribosomal A site may also have an association on the translation elongation rate. This finding of long-range impact on translation elongation velocity may shed new light on the regulatory mechanisms of protein synthesis. Overall, these results indicated that RiboMIMO can provide a useful tool for studying the regulation of translation elongation in the range of full-length CDS.
    Print ISSN: 1553-734X
    Electronic ISSN: 1553-7358
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...