ALBERT

All Library Books, journals and Electronic Records Telegrafenberg

Your email was sent successfully. Check your inbox.

An error occurred while sending the email. Please try again.

Proceed reservation?

Export
Filter
  • Books
  • Articles  (2,303)
  • Oxford University Press  (2,022)
  • PeerJ  (281)
  • MDPI Publishing
  • 2020-2022  (2,303)
  • Computer Science  (2,303)
Collection
  • Books
  • Articles  (2,303)
Years
Year
Journal
  • 1
    Publication Date: 2021-08-20
    Description: Motivation Accurate automatic annotation of protein function relies on both innovative models and robust data sets. Due to their importance in biological processes, the identification of DNA-binding proteins directly from protein sequence has been the focus of many studies. However, the data sets used to train and evaluate these methods have suffered from substantial flaws. We describe some of the weaknesses of the data sets used in previous DNA-binding protein literature and provide several new data sets addressing these problems. We suggest new evaluative benchmark tasks that more realistically assess real-world performance for protein annotation models. We propose a simple new model for the prediction of DNA-binding proteins and compare its performance on the improved data sets to two previously published models. Additionally, we provide extensive tests showing how the best models predict across taxonomies. Results Our new gradient boosting model, which uses features derived from a published protein language model, outperforms the earlier models. Perhaps surprisingly, so does a baseline nearest neighbor model using BLAST percent identity. We evaluate the sensitivity of these models to perturbations of DNA-binding regions and control regions of protein sequences. The successful data-driven models learn to focus on DNA-binding regions. When predicting across taxonomies, the best models are highly accurate across species in the same kingdom and can provide some information when predicting across kingdoms. Code and Data Availability The data and results for this paper can be found at https://doi.org/10.5281/zenodo.5153906. The code for this paper can be found at https://doi.org/10.5281/zenodo.5153683. The code, data and results can also be found at https://github.com/AZaitzeff/tools_for_dna_binding_proteins.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 2
    Publication Date: 2021-08-20
    Description: Background Rumor detection is a popular research topic in natural language processing and data mining. Since the outbreak of COVID-19, related rumors have been widely posted and spread on online social media, which have seriously affected people’s daily lives, national economy, social stability, etc. It is both theoretically and practically essential to detect and refute COVID-19 rumors fast and effectively. As COVID-19 was an emergent event that was outbreaking drastically, the related rumor instances were very scarce and distinct at its early stage. This makes the detection task a typical few-shot learning problem. However, traditional rumor detection techniques focused on detecting existed events with enough training instances, so that they fail to detect emergent events such as COVID-19. Therefore, developing a new few-shot rumor detection framework has become critical and emergent to prevent outbreaking rumors at early stages. Methods This article focuses on few-shot rumor detection, especially for detecting COVID-19 rumors from Sina Weibo with only a minimal number of labeled instances. We contribute a Sina Weibo COVID-19 rumor dataset for few-shot rumor detection and propose a few-shot learning-based multi-modality fusion model for few-shot rumor detection. A full microblog consists of the source post and corresponding comments, which are considered as two modalities and fused with the meta-learning methods. Results Experiments of few-shot rumor detection on the collected Weibo dataset and the PHEME public dataset have shown significant improvement and generality of the proposed model.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 3
    Publication Date: 2021-08-20
    Description: Spectral clustering (SC) is one of the most popular clustering methods and often outperforms traditional clustering methods. SC uses the eigenvectors of a Laplacian matrix calculated from a similarity matrix of a dataset. SC has serious drawbacks: the significant increases in the time complexity derived from the computation of eigenvectors and the memory space complexity to store the similarity matrix. To address the issues, I develop a new approximate spectral clustering using the network generated by growing neural gas (GNG), called ASC with GNG in this study. ASC with GNG uses not only reference vectors for vector quantization but also the topology of the network for extraction of the topological relationship between data points in a dataset. ASC with GNG calculates the similarity matrix from both the reference vectors and the topology of the network generated by GNG. Using the network generated from a dataset by GNG, ASC with GNG achieves to reduce the computational and space complexities and improve clustering quality. In this study, I demonstrate that ASC with GNG effectively reduces the computational time. Moreover, this study shows that ASC with GNG provides equal to or better clustering performance than SC.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 4
    Publication Date: 2021-08-17
    Description: Application of machine and deep learning methods in drug discovery and cancer research has gained a considerable amount of attention in the past years. As the field grows, it becomes crucial to systematically evaluate the performance of novel computational solutions in relation to established techniques. To this end, we compare rule-based and data-driven molecular representations in prediction of drug combination sensitivity and drug synergy scores using standardized results of 14 high-throughput screening studies, comprising 64 200 unique combinations of 4153 molecules tested in 112 cancer cell lines. We evaluate the clustering performance of molecular representations and quantify their similarity by adapting the Centered Kernel Alignment metric. Our work demonstrates that to identify an optimal molecular representation type, it is necessary to supplement quantitative benchmark results with qualitative considerations, such as model interpretability and robustness, which may vary between and throughout preclinical drug development projects.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 5
    Publication Date: 2021-08-06
    Description: Motivation The investigation of quantitative trait loci (QTL) is an essential component in our understanding of how organisms vary phenotypically. However, many important crop species are polyploid (carrying more than two copies of each chromosome), requiring specialized tools for such analyses. Moreover, deciphering meiotic processes at higher ploidy levels is not straightforward, but is necessary to understand the reproductive dynamics of these species, or uncover potential barriers to their genetic improvement. Results Here, we present polyqtlR, a novel software tool to facilitate such analyses in (auto)polyploid crops. It performs QTL interval mapping in F1 populations of outcrossing polyploids of any ploidy level using identity-by-descent probabilities. The allelic composition of discovered QTL can be explored, enabling favourable alleles to be identified and tracked in the population. Visualization tools within the package facilitate this process, and options to include genetic co-factors and experimental factors are included. Detailed information on polyploid meiosis including prediction of multivalent pairing structures, detection of preferential chromosomal pairing and location of double reduction events can be performed. Availabilityand implementation polyqtlR is freely available from the Comprehensive R Archive Network (CRAN) at http://cran.r-project.org/package=polyqtlR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 6
    Publication Date: 2021-08-20
    Description: Circular RNAs (circRNAs) are widely expressed in highly diverged eukaryotes. Although circRNAs have been known for many years, their function remains unclear. Interaction with RNA-binding protein (RBP) to influence post-transcriptional regulation is considered to be an important pathway for circRNA function, such as acting as an oncogenic RBP sponge to inhibit cancer. In this study, we design a deep learning framework, CRPBsites, to predict the binding sites of RBPs on circRNAs. In this model, the sequences of variable-length binding sites are transformed into embedding vectors by word2vec model. Bidirectional LSTM is used to encode the embedding vectors of binding sites, and then they are fed into another LSTM decoder for decoding and classification tasks. To train and test the model, we construct four datasets that contain sequences of variable-length binding sites on circRNAs, and each set corresponds to an RBP, which is overexpressed in bladder cancer tissues. Experimental results on four datasets and comparison with other existing models show that CRPBsites has superior performance. Afterwards, we found that there were highly similar binding motifs in the four binding site datasets. Finally, we applied well-trained CRPBsites to identify the binding sites of IGF2BP1 on circCDYL, and the results proved the effectiveness of this method. In conclusion, CRPBsites is an effective prediction model for circRNA-RBP interaction site identification. We hope that CRPBsites can provide valuable guidance for experimental studies on the influence of circRNA on post-transcriptional regulation.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 7
    Publication Date: 2021-08-20
    Description: Intratumoral heterogeneity is a well-documented feature of human cancers and is associated with outcome and treatment resistance. However, a heterogeneous tumor transcriptome contributes an unknown level of variability to analyses of differentially expressed genes (DEGs) that may contribute to phenotypes of interest, including treatment response. Although current clinical practice and the vast majority of research studies use a single sample from each patient, decreasing costs of sequencing technologies and computing power have made repeated-measures analyses increasingly economical. Repeatedly sampling the same tumor increases the statistical power of DEG analysis, which is indispensable toward downstream analysis and also increases one’s understanding of within-tumor variance, which may affect conclusions. Here, we compared five different methods for analyzing gene expression profiles derived from repeated sampling of human prostate tumors in two separate cohorts of patients. We also benchmarked the sensitivity of generalized linear models to linear mixed models for identifying DEGs contributing to relevant prostate cancer pathways based on a ground-truth model.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 8
    Publication Date: 2021-08-20
    Description: Efforts to elucidate protein–DNA interactions at the molecular level rely in part on accurate predictions of DNA-binding residues in protein sequences. While there are over a dozen computational predictors of the DNA-binding residues, they are DNA-type agnostic and significantly cross-predict residues that interact with other ligands as DNA binding. We leverage a custom-designed machine learning architecture to introduce DNAgenie, first-of-its-kind predictor of residues that interact with A-DNA, B-DNA and single-stranded DNA. DNAgenie uses a comprehensive physiochemical profile extracted from an input protein sequence and implements a two-step refinement process to provide accurate predictions and to minimize the cross-predictions. Comparative tests on an independent test dataset demonstrate that DNAgenie outperforms the current methods that we adapt to predict residue-level interactions with the three DNA types. Further analysis finds that the use of the second (refinement) step leads to a substantial reduction in the cross predictions. Empirical tests show that DNAgenie’s outputs that are converted to coarse-grained protein-level predictions compare favorably against recent tools that predict which DNA-binding proteins interact with double-stranded versus single-stranded DNAs. Moreover, predictions from the sequences of the whole human proteome reveal that the results produced by DNAgenie substantially overlap with the known DNA-binding proteins while also including promising leads for several hundred previously unknown putative DNA binders. These results suggest that DNAgenie is a valuable tool for the sequence-based characterization of protein functions. The DNAgenie’s webserver is available at http://biomine.cs.vcu.edu/servers/DNAgenie/.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 9
    Publication Date: 2021-08-20
    Description: Accurate prediction of immunogenic peptide recognized by T cell receptor (TCR) can greatly benefit vaccine development and cancer immunotherapy. However, identifying immunogenic peptides accurately is still a huge challenge. Most of the antigen peptides predicted in silico fail to elicit immune responses in vivo without considering TCR as a key factor. This inevitably causes costly and time-consuming experimental validation test for predicted antigens. Therefore, it is necessary to develop novel computational methods for precisely and effectively predicting immunogenic peptide recognized by TCR. Here, we described DLpTCR, a multimodal ensemble deep learning framework for predicting the likelihood of interaction between single/paired chain(s) of TCR and peptide presented by major histocompatibility complex molecules. To investigate the generality and robustness of the proposed model, COVID-19 data and IEDB data were constructed for independent evaluation. The DLpTCR model exhibited high predictive power with area under the curve up to 0.91 on COVID-19 data while predicting the interaction between peptide and single TCR chain. Additionally, the DLpTCR model achieved the overall accuracy of 81.03% on IEDB data while predicting the interaction between peptide and paired TCR chains. The results demonstrate that DLpTCR has the ability to learn general interaction rules and generalize to antigen peptide recognition by TCR. A user-friendly webserver is available at http://jianglab.org.cn/DLpTCR/. Additionally, a stand-alone software package that can be downloaded from https://github.com/jiangBiolab/DLpTCR.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 10
    Publication Date: 2021-07-11
    Description: Motivation The investigation of the structure of biological systems at the molecular level gives insights about their functions and dynamics. Shape and surface of biomolecules are fundamental to molecular recognition events. Characterizing their geometry can lead to more adequate predictions of their interactions. In the present work, we assess the performance of reference shape retrieval methods from the computer vision community on protein shapes. Results Shape retrieval methods are efficient in identifying orthologous proteins and tracking large conformational changes. This work illustrates the interest for the protein surface shape as a higher-level representation of the protein structure that (i) abstracts the underlying protein sequence, structure or fold, (ii) allows the use of shape retrieval methods to screen large databases of protein structures to identify surficial homologs and possible interacting partners and (iii) opens an extension of the protein structure–function paradigm toward a protein structure-surface(s)-function paradigm. Availabilityand implementation All data are available online at http://datasetmachat.drugdesign.fr. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 11
    Publication Date: 2021-08-18
    Description: Over the past decade, genome-wide assays for chromatin interactions in single cells have enabled the study of individual nuclei at unprecedented resolution and throughput. Current chromosome conformation capture techniques survey contacts for up to tens of thousands of individual cells, improving our understanding of genome function in 3D. However, these methods recover a small fraction of all contacts in single cells, requiring specialised processing of sparse interactome data. In this review, we highlight recent advances in methods for the interpretation of single-cell genomic contacts. After discussing the strengths and limitations of these methods, we outline frontiers for future development in this rapidly moving field.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 12
    Publication Date: 2021-08-14
    Description: Good knowledge of a peptide’s tertiary structure is important for understanding its function and its interactions with its biological targets. APPTEST is a novel computational protocol that employs a neural network architecture and simulated annealing methods for the prediction of peptide tertiary structure from the primary sequence. APPTEST works for both linear and cyclic peptides of 5–40 natural amino acids. APPTEST is computationally efficient, returning predicted structures within a number of minutes. APPTEST performance was evaluated on a set of 356 test peptides; the best structure predicted for each peptide deviated by an average of 1.9Å from its experimentally determined backbone conformation, and a native or near-native structure was predicted for 97% of the target sequences. A comparison of APPTEST performance with PEP-FOLD, PEPstrMOD and PepLook across benchmark datasets of short, long and cyclic peptides shows that on average APPTEST produces structures more native than the existing methods in all three categories. This innovative, cutting-edge peptide structure prediction method is available as an online web server at https://research.timmons.eu/apptest, facilitating in silico study and design of peptides by the wider research community.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 13
    Publication Date: 2021-08-20
    Description: Deep generative models have been an upsurge in the deep learning community since they were proposed. These models are designed for generating new synthetic data including images, videos and texts by fitting the data approximate distributions. In the last few years, deep generative models have shown superior performance in drug discovery especially de novo molecular design. In this study, deep generative models are reviewed to witness the recent advances of de novo molecular design for drug discovery. In addition, we divide those models into two categories based on molecular representations in silico. Then these two classical types of models are reported in detail and discussed about both pros and cons. We also indicate the current challenges in deep generative models for de novo molecular design. De novo molecular design automatically is promising but a long road to be explored.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 14
    Publication Date: 2021-08-19
    Description: DNA methylation may be regulated by genetic variants within a genomic region, referred to as methylation quantitative trait loci (mQTLs). The changes of methylation levels can further lead to alterations of gene expression, and influence the risk of various complex human diseases. Detecting mQTLs may provide insights into the underlying mechanism of how genotypic variations may influence the disease risk. In this article, we propose a methylation random field (MRF) method to detect mQTLs by testing the association between the methylation level of a CpG site and a set of genetic variants within a genomic region. The proposed MRF has two major advantages over existing approaches. First, it uses a beta distribution to characterize the bimodal and interval properties of the methylation trait at a CpG site. Second, it considers multiple common and rare genetic variants within a genomic region to identify mQTLs. Through simulations, we demonstrated that the MRF had improved power over other existing methods in detecting rare variants of relatively large effect, especially when the sample size is small. We further applied our method to a study of congenital heart defects with 83 cardiac tissue samples and identified two mQTL regions, MRPS10 and PSORS1C1, which were colocalized with expression QTL in cardiac tissue. In conclusion, the proposed MRF is a useful tool to identify novel mQTLs, especially for studies with limited sample sizes.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 15
    Publication Date: 2021-06-29
    Description: Motivation The mathematically optimal solution in computational protein folding simulations does not always correspond to the native structure, due to the imperfection of the energy force fields. There is therefore a need to search for more diverse suboptimal solutions in order to identify the states close to the native. We propose a novel multimodal optimization protocol to improve the conformation sampling efficiency and modeling accuracy of de novo protein structure folding simulations. Results A distance-assisted multimodal optimization sampling algorithm, MMpred, is proposed for de novo protein structure prediction. The protocol consists of three stages: The first is a modal exploration stage, in which a structural similarity evaluation model DMscore is designed to control the diversity of conformations, generating a population of diverse structures in different low-energy basins. The second is a modal maintaining stage, where an adaptive clustering algorithm MNDcluster is proposed to divide the populations and merge the modal by adjusting the annealing temperature to locate the promising basins. In the last stage of modal exploitation, a greedy search strategy is used to accelerate the convergence of the modal. Distance constraint information is used to construct the conformation scoring model to guide sampling. MMpred is tested on a large set of 320 non-redundant proteins, where MMpred obtains models with TM-score≥0.5 on 291 cases, which is 28% higher than that of Rosetta guided with the same set of distance constraints. In addition, on 320 benchmark proteins, the enhanced version of MMpred (E-MMpred) has 167 targets better than trRosetta when the best of five models are evaluated. The average TM-score of the best model of E-MMpred is 0.732, which is comparable to trRosetta (0.730). Availability and implementation The source code and executable are freely available at https://github.com/iobio-zjut/MMpred. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 16
    Publication Date: 2021-08-20
    Description: Antimicrobial resistance (AMR) poses a threat to global public health. To mitigate the impacts of AMR, it is important to identify the molecular mechanisms of AMR and thereby determine optimal therapy as early as possible. Conventional machine learning-based drug-resistance analyses assume genetic variations to be homogeneous, thus not distinguishing between coding and intergenic sequences. In this study, we represent genetic data from Mycobacterium tuberculosis as a graph, and then adopt a deep graph learning method—heterogeneous graph attention network (‘HGAT–AMR’)—to predict anti-tuberculosis (TB) drug resistance. The HGAT–AMR model is able to accommodate incomplete phenotypic profiles, as well as provide ‘attention scores’ of genes and single nucleotide polymorphisms (SNPs) both at a population level and for individual samples. These scores encode the inputs, which the model is ‘paying attention to’ in making its drug resistance predictions. The results show that the proposed model generated the best area under the receiver operating characteristic (AUROC) for isoniazid and rifampicin (98.53 and 99.10%), the best sensitivity for three first-line drugs (94.91% for isoniazid, 96.60% for ethambutol and 90.63% for pyrazinamide), and maintained performance when the data were associated with incomplete phenotypes (i.e. for those isolates for which phenotypic data for some drugs were missing). We also demonstrate that the model successfully identifies genes and SNPs associated with drug resistance, mitigating the impact of resistance profile while considering particular drug resistance, which is consistent with domain knowledge.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 17
    Publication Date: 2021-08-20
    Description: Protein engineering and design principles employing the 20 standard amino acids have been extensively used to achieve stable protein scaffolds and deliver their specific activities. Although this confers some advantages, it often restricts the sequence, chemical space, and ultimately the functional diversity of proteins. Moreover, although site-specific incorporation of non-natural amino acids (nnAAs) has been proven to be a valuable strategy in protein engineering and therapeutics development, its utility in the affinity-maturation of nanobodies is not fully explored. Besides, current experimental methods do not routinely employ nnAAs due to their enormous library size and infinite combinations. To address this, we have developed an integrated computational pipeline employing structure-based protein design methodologies, molecular dynamics simulations and free energy calculations, for the binding affinity prediction of an nnAA-incorporated nanobody toward its target and selection of potent binders. We show that by incorporating halogenated tyrosines, the affinity of 9G8 nanobody can be improved toward epidermal growth factor receptor (EGFR), a crucial cancer target. Surface plasmon resonance (SPR) assays showed that the binding of several 3-chloro-l-tyrosine (3MY)-incorporated nanobodies were improved up to 6-fold into a picomolar range, and the computationally estimated binding affinities shared a Pearson’s r of 0.87 with SPR results. The improved affinity was found to be due to enhanced van der Waals interactions of key 3MY-proximate nanobody residues with EGFR, and an overall increase in the nanobody’s structural stability. In conclusion, we show that our method can facilitate screening large libraries and predict potent site-specific nnAA-incorporated nanobody binders against crucial disease-targets.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 18
    Publication Date: 2021-08-20
    Description: Over the past few years, meta-analysis has become popular among biomedical researchers for detecting biomarkers across multiple cohort studies with increased predictive power. Combining datasets from different sources increases sample size, thus overcoming the issue related to limited sample size from each individual study and boosting the predictive power. This leads to an increased likelihood of more accurately predicting differentially expressed genes/proteins or significant biomarkers underlying the biological condition of interest. Currently, several meta-analysis methods and tools exist, each having its own strengths and limitations. In this paper, we survey existing meta-analysis methods, and assess the performance of different methods based on results from different datasets as well as assessment from prior knowledge of each method. This provides a reference summary of meta-analysis models and tools, which helps to guide end-users on the choice of appropriate models or tools for given types of datasets and enables developers to consider current advances when planning the development of new meta-analysis models and more practical integrative tools.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 19
    Publication Date: 2021-08-12
    Description: Motivation Co-evolution analysis can be used to accurately predict residue–residue contacts from multiple sequence alignments. The introduction of machine-learning techniques has enabled substantial improvements in precision and a shift from predicting binary contacts to predict distances between pairs of residues. These developments have significantly improved the accuracy of de novo prediction of static protein structures. With AlphaFold2 lifting the accuracy of some predicted protein models close to experimental levels, structure prediction research will move on to other challenges. One of those areas is the prediction of more than one conformation of a protein. Here, we examine the potential of residue–residue distance predictions to be informative of protein flexibility rather than simply static structure. Results We used DMPfold to predict distance distributions for every residue pair in a set of proteins that showed both rigid and flexible behaviour. Residue pairs that were in contact in at least one reference structure were classified as rigid, flexible or neither. The predicted distance distribution of each residue pair was analysed for local maxima of probability indicating the most likely distance or distances between a pair of residues. We found that rigid residue pairs tended to have only a single local maximum in their predicted distance distributions while flexible residue pairs more often had multiple local maxima. These results suggest that the shape of predicted distance distributions contains information on the rigidity or flexibility of a protein and its constituent residues. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 20
    Publication Date: 2021-08-16
    Description: Motivation The well-known fact that protein structures are more conserved than their sequences forms the basis of several areas of computational structural biology. Methods based on the structure analysis provide more complete information on residue conservation in evolutionary processes. This is crucial for the determination of evolutionary relationships between proteins and for the identification of recurrent structural patterns present in biomolecules involved in similar functions. However, algorithmic structural alignment is much more difficult than multiple sequence alignment. This study is devoted to the development and applications of DAMA—a novel effective environment capable to compute and analyze multiple structure alignments. Results DAMA is based on local structural similarities, using local 3D structure descriptors and thus accounts for nearest-neighbor molecular environments of aligned residues. It is constrained neither by protein topology nor by its global structure. DAMA is an extension of our previous study (DEDAL) which demonstrated the applicability of local descriptors to pairwise alignment problems. Since the multiple alignment problem is NP-complete, an effective heuristic approach has been developed without imposing any artificial constraints. The alignment algorithm searches for the largest, consistent ensemble of similar descriptors. The new method is capable to capture most of the biologically significant similarities present in canonical test sets and is discriminatory enough to prevent the emergence of larger, but meaningless, solutions. Tests performed on the test sets, including protein kinases, demonstrate DAMA’s capability of identifying equivalent residues, which should be very useful in discovering the biological nature of proteins similarity. Performance profiles show the advantage of DAMA over other methods, in particular when using a strict similarity measure QC, which is the ratio of correctly aligned columns, and when applying the methods to more difficult cases. Availability and implementation DAMA is available online at http://dworkowa.imdik.pan.pl/EP/DAMA. Linux binaries of the software are available upon request. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 21
    Publication Date: 2021-03-31
    Description: The agricultural sector is still lagging behind from all other sectors in terms of using the newest technologies. For production, the latest machines are being introduced and adopted. However, pre-harvest and post-harvest processing are still done by following traditional methodologies while tracing, storing, and publishing agricultural data. As a result, farmers are not getting deserved payment, consumers are not getting enough information before buying their product, and intermediate person/processors are increasing retail prices. Using blockchain, smart contracts, and IoT devices, we can fully automate the process while establishing absolute trust among all these parties. In this research, we explored the different aspects of using blockchain and smart contracts with the integration of IoT devices in pre-harvesting and post-harvesting segments of agriculture. We proposed a system that uses blockchain as the backbone while IoT devices collect data from the field level, and smart contracts regulate the interaction among all these contributing parties. The system implementation has been shown in diagrams and with proper explanations. Gas costs of every operation have also been attached for a better understanding of the costs. We also analyzed the system in terms of challenges and advantages. The overall impact of this research was to show the immutable, available, transparent, and robustly secure characteristics of blockchain in the field of agriculture while also emphasizing the vigorous mechanism that the collaboration of blockchain, smart contract, and IoT presents.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 22
    Publication Date: 2021-03-31
    Description: Motivation Assigning new sequences to known protein families and subfamilies is a prerequisite for many functional, comparative and evolutionary genomics analyses. Such assignment is commonly achieved by looking for the closest sequence in a reference database, using a method such as BLAST. However, ignoring the gene phylogeny can be misleading because a query sequence does not necessarily belong to the same subfamily as its closest sequence. For example, a hemoglobin which branched out prior to the hemoglobin alpha/beta duplication could be closest to a hemoglobin alpha or beta sequence, whereas it is neither. To overcome this problem, phylogeny-driven tools have emerged but rely on gene trees, whose inference is computationally expensive. Results Here, we first show that in multiple animal and plant datasets, 18 to 62% of assignments by closest sequence are misassigned, typically to an over-specific subfamily. Then, we introduce OMAmer, a novel alignment-free protein subfamily assignment method, which limits over-specific subfamily assignments and is suited to phylogenomic databases with thousands of genomes. OMAmer is based on an innovative method using evolutionarily-informed k-mers for alignment-free mapping to ancestral protein subfamilies. Whilst able to reject non-homologous family-level assignments, we show that OMAmer provides better and quicker subfamily-level assignments than approaches relying on the closest sequence, whether inferred exactly by Smith-Waterman or by the fast heuristic DIAMOND. Availability OMAmer is available from the Python Package Index (as omamer), with the source code and a precomputed database available at https://github.com/DessimozLab/omamer. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 23
    Publication Date: 2021-03-31
    Description: Summary VCF files with results of sequencing projects take a lot of space. We propose the VCFShark, which is able to compress VCF files up to an order of magnitude better than the de facto standards (gzipped VCF and BCF). The advantage over competitors is the greatest when compressing VCF files containing large amounts of genotype data. The processing speeds up to 100 MB/s and main memory requirements lower than 30 GB allow to use our tool at typical workstations even for large datasets. Availability and Implementation https://github.com/refresh-bio/vcfshark Supplementary information Supplementary data are available at publisher’s Web site.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 24
  • 25
    Publication Date: 2021-03-28
    Description: Motivation As the generation of complex single-cell RNA sequencing datasets becomes more commonplace it is the responsibility of researchers to provide access to these data in a way that can be easily explored and shared. Whilst it is often the case that data is deposited for future bioinformatic analysis many studies do not release their data in a way that is easy to explore by non-computational researchers. Results In order to help address this we have developed ShinyCell, an R package that converts single-cell RNA sequencing datasets into explorable and shareable interactive interfaces. These interfaces can be easily customised in order to maximise their usability and can be easily uploaded to online platforms to facilitate wider access to published data. Availability ShinyCell is available at https://github.com/SGDDNB/ShinyCell and https://figshare.com/projects/ShinyCell/100439.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 26
    Publication Date: 2021-03-28
    Description: Motivation Genomic selection (GS) is currently deemed the most effective approach to speed up breeding of agricultural varieties. It has been recognized that consideration of multiple traits in GS can improve accuracy of prediction for traits of low heritability. However, since GS forgoes statistical testing with the idea of improving predictions, it does not facilitate mechanistic understanding of the contribution of particular single nucleotide polymorphisms (SNP). Results Here we propose a L2,1-norm regularized multivariate regression model and devise a fast and efficient iterative optimization algorithm, called L2,1-joint, applicable in multi-trait GS. The usage of the L2,1-norm facilitates variable selection in a penalized multivariate regression that considers the relation between individuals, when the number of SNPs is much larger than the number of individuals. The capacity for variable selection allows us to define master regulators that can be used in a multi-trait GS setting to dissect the genetic architecture of the analyzed traits. Our comparative analyses demonstrate that the proposed model is a favorable candidate compared to existing state-of-the-art approaches. Prediction and variable selection with data sets from Brassica napus, wheat and Arabidopsis thaliana diversity panels are conducted to further showcase the performance of the proposed model. Availability and implementation The model is implemented using R programming language and the code is freely available from https://github.com/alainmbebi/L21-norm-GS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 27
    Publication Date: 2021-03-28
    Description: Summary Finding informative predictive features in high dimensional biological case-control datasets is challenging. The Extreme Pseudo-Sampling (EPS) algorithm offers a solution to the challenge of feature selection via a combination of deep learning and linear regression models. First, using a variational autoencoder, it generates complex latent representations for the samples. Second, it classifies the latent representations of cases and controls via logistic regression. Third, it generates new samples (pseudo-samples) around the extreme cases and controls in the regression model. Finally, it trains a new regression model over the upsampled space. The most significant variables in this regression are selected. We present an open-source implementation of the algorithm that is easy to set up, use, and customize. Our package enhances the original algorithm by providing new features and customizability for data preparation, model training and classification functionalities. We believe the new features will enable the adoption of the algorithm for a diverse range of datasets. Availability The software package for Python is available online at https://github.com/roohy/eps
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 28
    Publication Date: 2021-03-23
    Description: Background The Internet of Medical Things (IoMTs) is gradually replacing the traditional healthcare system. However, little attention has been paid to their security requirements in the development of the IoMT devices and systems. One of the main reasons can be the difficulty of tuning conventional security solutions to the IoMT system. Machine Learning (ML) has been successfully employed in the attack detection and mitigation process. Advanced ML technique can also be a promising approach to address the existing and anticipated IoMT security and privacy issues. However, because of the existing challenges of IoMT system, it is imperative to know how these techniques can be effectively utilized to meet the security and privacy requirements without affecting the IoMT systems quality, services, and device’s lifespan. Methodology This article is devoted to perform a Systematic Literature Review (SLR) on the security and privacy issues of IoMT and their solutions by ML techniques. The recent research papers disseminated between 2010 and 2020 are selected from multiple databases and a standardized SLR method is conducted. A total of 153 papers were reviewed and a critical analysis was conducted on the selected papers. Furthermore, this review study attempts to highlight the limitation of the current methods and aims to find possible solutions to them. Thus, a detailed analysis was carried out on the selected papers through focusing on their methods, advantages, limitations, the utilized tools, and data. Results It was observed that ML techniques have been significantly deployed for device and network layer security. Most of the current studies improved traditional metrics while ignored performance complexity metrics in their evaluations. Their studies environments and utilized data barely represent IoMT system. Therefore, conventional ML techniques may fail if metrics such as resource complexity and power usage are not considered.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 29
    Publication Date: 2021-03-23
    Description: The robot controller plays an important role in controlling the robot. The controller mainly aims to eliminate or suppress the influence of uncertain factors on the control robot. Furthermore, there are many types of controllers, and different types of controllers have different features. To explore the differences between controllers of the same category, this article studies some controllers from basic controllers and advanced controllers. This article conducts the benchmarking of the selected controller through pre-set tests. The test task is the most commonly used pick and place. Furthermore, to complete the robustness test, a task of external force interference is also set to observe whether the controller can control the robot arm to return to a normal state. Subsequently, the accuracy, control efficiency, jitter and robustness of the robot arm controlled by the controller are analyzed by comparing the Position and Effort data. Finally, some future works of the benchmarking and reasonable improvement methods are discussed.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 30
    Publication Date: 2021-03-24
    Description: Motivation There are high demands for joint genotyping of structural variations with short-read sequencing, but efficient and accurate genotyping in population scale is a challenging task. Results We developed muCNV that aggregates per-sample summary pileups for joint genotyping of 〉 100,000 samples. Pilot results show very low Mendelian inconsistencies. Applications to large-scale projects in cloud show the computational efficiencies of muCNV genotyping pipeline. Availability muCNV is publicly available for download at: https://github.com/gjun/muCNV Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 31
    Publication Date: 2021-03-22
    Description: The security of patient information is important during the transfer of medical data. A hybrid spatial domain watermarking algorithm that includes encryption, integrity protection, and steganography is proposed to strengthen the information originality based on the authentication. The proposed algorithm checks whether the patient’s information has been deliberately changed or not. The created code is distributed at every pixel of the medical image and not only in the regions of non-interest pixels, while the image details are still preserved. To enhance the security of the watermarking code, SHA-1 is used to get the initial key for the Symmetric Encryption Algorithm. The target of this approach is to preserve the content of the image and the watermark simultaneously, this is achieved by synthesizing an encrypted watermark from one of the components of the original image and not by embedding a watermark in the image. To evaluate the proposed code the Least Significant Bit (LSB), Bit2SB, and Bit3SB were used. The evaluation of the proposed code showed that the LSB is of better quality but overall the Bit2SB is better in its ability against the active attacks up to a size of 2*2 pixels, and it preserves the high image quality.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 32
    Publication Date: 2021-03-26
    Description: Motivation Molecular property prediction is a hot topic in recent years. Existing graph-based models ignore the hierarchical structures of molecules. According to the knowledge of chemistry and pharmacy, the functional groups of molecules are closely related to its physio-chemical properties and binding affinities. So, it should be helpful to represent molecular graphs by fragments that contain functional groups for molecular property prediction. Results In this paper, to boost the performance of molecule property prediction, we first propose a definition of molecule graph fragments that may be or contain functional groups, which are relevant to molecular properties, then develop a fragment-oriented multi-scale graph attention network for molecular property prediction, which is called FraGAT. Experiments on several widely-used benchmarks are conducted to evaluate FraGAT. Experimental results show that FraGAT achieves state-of-the-art predictive performance in most cases. Furthermore, our case studies showthat when the fragments used to represent the molecule graphs contain functional groups, the model can make better predictions. This conforms to our expectation and demonstrates the interpretability of the proposed model. Availability and implementation The code and data underlying this work are available in GitHub, at https://github.com/ZiqiaoZhang/FraGAT. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 33
    Publication Date: 2021-03-26
    Description: Motivation The Anatomical Therapeutic Chemical (ATC) system is an official classification system established by the World Health Organization for medicines. Correctly assigning ATC classes to given compounds is an important research problem in drug discovery, which can not only discover the possible active ingredients of the compounds, but also infer theirs therapeutic, pharmacological, and chemical properties. Results In this paper, we develop an end-to-end multi-label classifier called CGATCPred to predict 14 main ATC classes for given compounds. In order to extract rich features of each compound, we use the deep Convolutional Neural Network (CNN) and shortcut connections to represent and learn the seven association scores between the given compound and others. Moreover, we construct the correlation graph of ATC classes and then apply graph convolutional network (GCN) on the graph for label embedding abstraction. We use all label embedding to guide the learning process of compound representation. As a result, by using the Jackknife test, CGATCPred obtain reliable Aiming of 81.94%, Coverage of 82.88%, Accuracy 80.81%, Absolute True 76.58% and Absolute False 2.75%, yielding significantly improvements compared to exiting multi-label classifiers. Availability The codes of CGATCPred are available at https://github.com/zhc940702/CGATCPred and https://zenodo.org/record/4552917. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 34
    Publication Date: 2021-03-24
    Description: Motivation Sequence motif discovery algorithms can identify novel sequence patterns that perform biological functions in DNA, RNA and protein sequences—for example, the binding site motifs of DNA-and RNA-binding proteins. Results The STREME algorithm presented here advances the state-of-the-art in ab initio motif discovery in terms of both accuracy and versatility. Using in vivo DNA (ChIP-seq) and RNA (CLIP-seq) data, and validating motifs with reference motifs derived from in vitro data, we show that STREME is more accurate, sensitive and thorough than several widely used algorithms (DREME, HOMER, MEME, Peak-motifs) and two other representative algorithms (ProSampler and Weeder). STREME’s capabilities include the ability to find motifs in datasets with hundreds of thousands of sequences, to find both short and long motifs (from 3 to 30 positions), to perform differential motif discovery in pairs of sequence datasets, and to find motifs in sequences over virtually any alphabet (DNA, RNA, protein and user-defined alphabets). Unlike most motif discovery algorithms, STREME reports a useful estimate of the statistical significance of each motif it discovers. STREME is easy to use individually via its web server or via the command line, and is completely integrated with the widely-used MEME Suite of sequence analysis tools. The name STREME stands for “Simple, Thorough, Rapid, Enriched Motif Elicitation”. Availability The STREME web server and source code are provided freely for non-commercial use at http://meme-suite.org.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 35
    Publication Date: 2021-03-24
    Description: Motivation Understanding the mechanisms by which the zebrafish pectoral fin develops is expected to produce insights on how vertebrate limbs grow from a 2D cell layer to a 3D structure. Two mechanisms have been proposed to drive limb morphogenesis in tetrapods: a growth-based morphogenesis with a higher proliferation rate at the distal tip of the limb bud than at the proximal side, and directed cell behaviors that include elongation, division and migration in a nonrandom manner. Based on quantitative experimental biological data at the level of individual cells in the whole developing organ, we test the conditions for the dynamics of pectoral fin early morphogenesis. Results We found that during the development of the zebrafish pectoral fin, cells have a preferential elongation axis that gradually aligns along the proximodistal axis (PD) of the organ. Based on these quantitative observations, we build a center-based cell model enhanced with a polarity term and cell proliferation to simulate fin growth. Our simulations resulted in 3D fins similar in shape to the observed ones, suggesting that the existence of a preferential axis of cell polarization is essential to drive fin morphogenesis in zebrafish, as observed in the development of limbs in the mouse, but distal tip-based expansion is not. Availability Upon publication, biological data will be available at http://bioemergences.eu/modelingFin, and source code at https://github.com/guijoe/MaSoFin. Supplementary information Supplementary data are included in this manuscript.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 36
    Publication Date: 2021-03-23
    Description: Human posture detection allows the capture of the kinematic parameters of the human body, which is important for many applications, such as assisted living, healthcare, physical exercising and rehabilitation. This task can greatly benefit from recent development in deep learning and computer vision. In this paper, we propose a novel deep recurrent hierarchical network (DRHN) model based on MobileNetV2 that allows for greater flexibility by reducing or eliminating posture detection problems related to a limited visibility human torso in the frame, i.e., the occlusion problem. The DRHN network accepts the RGB-Depth frame sequences and produces a representation of semantically related posture states. We achieved 91.47% accuracy at 10 fps rate for sitting posture recognition.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 37
    Publication Date: 2021-03-24
    Description: Motivation Knowledge manipulation of Gene Ontology (GO) and Gene Ontology Annotation (GOA) can be done primarily by using vector representation of GO terms and genes. Previous studies have represented GO terms and genes or gene products in Euclidean space to measure their semantic similarity using an embedding method such as the Word2Vec-based method to represent entities as numeric vectors. However, this method has the limitation that embedding large graph-structured data in the Euclidean space cannot prevent a loss of information of latent hierarchies, thus precluding the semantics of GO and GOA from being captured optimally. On the other hand, hyperbolic spaces such as the Poincaré balls are more suitable for modeling hierarchies, as they have a geometric property in which the distance increases exponentially as it nears the boundary because of negative curvature. Results In this paper, we propose hierarchical representations of GO and genes (HiG2Vec) by applying Poincaré embedding specialized in the representation of hierarchy through a two-step procedure: GO embedding and gene embedding. Through experiments, we show that our model represents the hierarchical structure better than other approaches and predicts the interaction of genes or gene products similar to or better than previous studies. The results indicate that HiG2Vec is superior to other methods in capturing the GO and gene semantics and in data utilization as well. It can be robustly applied to manipulate various biological knowledge. Availability https://github.com/JaesikKim/HiG2Vec Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 38
    Publication Date: 2021-02-01
    Description: It is important in software development to enforce proper restrictions on protected services and resources. Typically software services can be accessed through REST API endpoints where restrictions can be applied using the Role-Based Access Control (RBAC) model. However, RBAC policies can be inconsistent across services, and they require proper assessment. Currently, developers use penetration testing, which is a costly and cumbersome process for a large number of APIs. In addition, modern applications are split into individual microservices and lack a unified view in order to carry out automated RBAC assessment. Often, the process of constructing a centralized perspective of an application is done using Systematic Architecture Reconstruction (SAR). This article presents a novel approach to automated SAR to construct a centralized perspective for a microservice mesh based on their REST communication pattern. We utilize the generated views from SAR to propose an automated way to find RBAC inconsistencies.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 39
    Publication Date: 2021-03-23
    Description: Motivation Facing the increasing gap between high-throughput sequence data and limited functional insights, computational protein function annotation provides a high-throughput alternative to experimental approaches. However, current methods can have limited applicability while relying on protein data besides sequences, or lack generalizability to novel sequences, species and functions. Results To overcome aforementioned barriers in applicability and generalizability, we propose a novel deep learning model using only sequence information for proteins, named Transformer-based protein function Annotation through joint sequence–Label Embedding (TALE). For generalizability to novel sequences we use self attention-based transformers to capture global patterns in sequences. For generalizability to unseen or rarely seen functions (tail labels), we embed protein function labels (hierarchical GO terms on directed graphs) together with inputs/features (1D sequences) in a joint latent space. Combining TALE and a sequence similarity-based method, TALE+ outperformed competing methods when only sequence input is available. It even outperformed a state-of-the-art method using network information besides sequence, in two of the three gene ontologies. Furthermore, TALE and TALE+ showed superior generalizability to proteins of low similarity, new species, or rarely annotated functions compared to training data, revealing deep insights into the protein sequence–function relationship. Ablation studies elucidated contributions of algorithmic components toward the accuracy and the generalizability. Availability The data, source codes and models are available at https://github.com/Shen-Lab/TALE Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 40
    Publication Date: 2021-03-23
    Description: Motivation Random sampling of metabolic fluxes can provide a comprehensive description of the capabilities of a metabolic network. However, current sampling approaches do not model thermodynamics explicitly, leading to inaccurate predictions of an organism’s potential or actual metabolic operations. Results We present a probabilistic framework combining thermodynamic quantities with steady-state flux constraints to analyze the properties of a metabolic network. It includes methods for probabilistic metabolic optimization and for joint sampling of thermodynamic and flux spaces. Applied to a model of E. coli, we use the methods to reveal known and novel mechanisms of substrate channeling, and to accurately predict reaction directions and metabolite concentrations. Interestingly, predicted flux distributions are multimodal, leading to discrete hypotheses on E. coli’s metabolic capabilities. Availability Python and MATLAB packages available at https://gitlab.com/csb.ethz/pta. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 41
    Publication Date: 2021-03-26
    Description: Motivation Thanks to the increasing availability of drug-drug interactions (DDI) datasets and large biomedical knowledge graphs (KGs), accurate detection of adverse DDI using machine learning models becomes possible. However, it remains largely an open problem how to effectively utilize large and noisy biomedical KG for DDI detection. Due to its sheer size and amount of noise in KGs, it is often less beneficial to directly integrate KGs with other smaller but higher quality data (e.g., experimental data). Most of existing approaches ignore KGs altogether. Some tries to directly integrate KGs with other data via graph neural networks with limited success. Furthermore most previous works focus on binary DDI prediction whereas the multi-typed DDI pharmacological effect prediction is more meaningful but harder task. Results To fill the gaps, we propose a new method SumGNN: knowledge summarization graph neural network, which is enabled by a subgraph extraction module that can efficiently anchor on relevant subgraphs from a KG, a self-attention based subgraph summarization scheme to generate reasoning path within the subgraph, and a multi-channel knowledge and data integration module that utilizes massive external biomedical knowledge for significantly improved multi-typed DDI predictions. SumGNN outperforms the best baseline by up to 5.54%, and performance gain is particularly significant in low data relation types. In addition, SumGNN provides interpretable prediction via the generated reasoning paths for each prediction. Availability The code is available in the supplementary. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 42
    Publication Date: 2021-03-27
    Description: Motivation Most protein-structure superimposition tools consider only Cartesian coordinates. Yet, much of biology happens on the surface of proteins, which is why proteins with shared ancestry and similar function often have comparable surface shapes. Superposition of proteins based on surface shape can enable comparison of highly divergent proteins, identify convergent evolution and enable detailed comparison of surface features and binding sites. Results We present ZEAL, an interactive tool to superpose global and local protein structures based on their shape resemblance using 3D (Zernike-Canterakis) functions to represent the molecular surface. In a benchmark study of structures with the same fold, we show that ZEAL outperforms two other methods for shape-based superposition. In addition, alignments from ZEAL was of comparable quality to the coordinate-based superpositions provided by TM-align. For comparisons of proteins with limited sequence and backbone-fold similarity, where coordinate-based methods typically fail, ZEAL can often find alignments with substantial surface-shape correspondence. In combination with shape-based matching, ZEAL can be used as a general tool to study relationships between shape and protein function. We identify several categories of protein functions where global shape similarity is significantly more likely than expected by random chance, when comparing proteins with little similarity on the fold level. In particular, we find that global surface shape similarity is particular common among DNA binding proteins. Availability ZEAL can be used online at https://andrelab.org/zeal or as a standalone program with command line or graphical user interface. Source files and installers are available at https://github.com/Andre-lab/ZEAL Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 43
    Publication Date: 2021-03-25
    Description: A microarray is a revolutionary tool that generates vast volumes of data that describe the expression profiles of genes under investigation that can be qualified as Big Data. Hadoop and Spark are efficient frameworks, developed to store and analyze Big Data. Analyzing microarray data helps researchers to identify correlated genes. Clustering has been successfully applied to analyze microarray data by grouping genes with similar expression profiles into clusters. The complex nature of microarray data obligated clustering methods to employ multiple evaluation functions to ensure obtaining solutions with high quality. This transformed the clustering problem into a Multi-Objective Problem (MOP). A new and efficient hybrid Multi-Objective Whale Optimization Algorithm with Tabu Search (MOWOATS) was proposed to solve MOPs. In this article, MOWOATS is proposed to analyze massive microarray datasets. Three evaluation functions have been developed to ensure an effective assessment of solutions. MOWOATS has been adapted to run in parallel using Spark over Hadoop computing clusters. The quality of the generated solutions was evaluated based on different indices, such as Silhouette and Davies–Bouldin indices. The obtained clusters were very similar to the original classes. Regarding the scalability, the running time was inversely proportional to the number of computing nodes.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 44
    Publication Date: 2021-03-25
    Description: The interdisciplinary field of data science, which applies techniques from computer science and statistics to address questions across domains, has enjoyed recent considerable growth and interest. This emergence also extends to undergraduate education, whereby a growing number of institutions now offer degree programs in data science. However, there is considerable variation in what the field actually entails and, by extension, differences in how undergraduate programs prepare students for data-intensive careers. We used two seminal frameworks for data science education to evaluate undergraduate data science programs at a subset of 4-year institutions in the United States; developing and applying a rubric, we assessed how well each program met the guidelines of each of the frameworks. Most programs scored high in statistics and computer science and low in domain-specific education, ethics, and areas of communication. Moreover, the academic unit administering the degree program significantly influenced the course-load distribution of computer science and statistics/mathematics courses. We conclude that current data science undergraduate programs provide solid grounding in computational and statistical approaches, yet may not deliver sufficient context in terms of domain knowledge and ethical considerations necessary for appropriate data science applications. Additional refinement of the expectations for undergraduate data science education is warranted.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 45
    Publication Date: 2021-03-25
    Description: Urban expressways provide an effective solution to traffic congestion, and ramp signal optimization can ensure the efficiency of expressway traffic. The existing methods are mainly based on the static spatial distance between mainline and ramp to achieve multi-ramp coordinated signal optimization, which lacks the consideration of the dynamic traffic flow and lead to the long time-lag, thus affecting the efficiency. This article develops a coordinated ramp signal optimization framework based on mainline traffic states. The main contribution was traffic flow-series flux-correlation analysis based on cross-correlation, and development of a novel multifactorial matric that combines flow-correlation to assign the excess demand for mainline traffic. Besides, we used the GRU neural network for traffic flow prediction to ensure real-time optimization. To obtain a more accurate correlation between ramps and congested sections, we used gray correlation analysis to determine the percentage of each factor. We used the Simulation of Urban Mobility simulation platform to evaluate the performance of the proposed method under different traffic demand conditions, and the experimental results show that the proposed method can reduce the density of mainline bottlenecks and improve the efficiency of mainline traffic.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 46
    Publication Date: 2021-03-11
    Description: Investing in stocks is an important tool for modern people’s financial management, and how to forecast stock prices has become an important issue. In recent years, deep learning methods have successfully solved many forecast problems. In this paper, we utilized multiple factors for the stock price forecast. The news articles and PTT forum discussions are taken as the fundamental analysis, and the stock historical transaction information is treated as technical analysis. The state-of-the-art natural language processing tool BERT are used to recognize the sentiments of text, and the long short term memory neural network (LSTM), which is good at analyzing time series data, is applied to forecast the stock price with stock historical transaction information and text sentiments. According to experimental results using our proposed models, the average root mean square error (RMSE ) has 12.05 accuracy improvement.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 47
    Publication Date: 2021-03-17
    Description: Motivation For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. Results In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. Availability The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 48
    Publication Date: 2021-03-18
    Description: Emerging research shows that circular RNA (circRNA) plays a crucial role in the diagnosis, occurrence and prognosis of complex human diseases. Compared with traditional biological experiments, the computational method of fusing multi-source biological data to identify the association between circRNA and disease can effectively reduce cost and save time. Considering the limitations of existing computational models, we propose a semi-supervised generative adversarial network (GAN) model SGANRDA for predicting circRNA–disease association. This model first fused the natural language features of the circRNA sequence and the features of disease semantics, circRNA and disease Gaussian interaction profile kernel, and then used all circRNA–disease pairs to pre-train the GAN network, and fine-tune the network parameters through labeled samples. Finally, the extreme learning machine classifier is employed to obtain the prediction result. Compared with the previous supervision model, SGANRDA innovatively introduced circRNA sequences and utilized all the information of circRNA–disease pairs during the pre-training process. This step can increase the information content of the feature to some extent and reduce the impact of too few known associations on the model performance. SGANRDA obtained AUC scores of 0.9411 and 0.9223 in leave-one-out cross-validation and 5-fold cross-validation, respectively. Prediction results on the benchmark dataset show that SGANRDA outperforms other existing models. In addition, 25 of the top 30 circRNA–disease pairs with the highest scores of SGANRDA in case studies were verified by recent literature. These experimental results demonstrate that SGANRDA is a useful model to predict the circRNA–disease association and can provide reliable candidates for biological experiments.
    Print ISSN: 1467-5463
    Electronic ISSN: 1477-4054
    Topics: Biology , Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 49
    Publication Date: 2021-02-18
    Description: Increased interest in the use of word embeddings, such as word representation, for biomedical named entity recognition (BioNER) has highlighted the need for evaluations that aid in selecting the best word embedding to be used. One common criterion for selecting a word embedding is the type of source from which it is generated; that is, general (e.g., Wikipedia, Common Crawl), or specific (e.g., biomedical literature). Using specific word embeddings for the BioNER task has been strongly recommended, considering that they have provided better coverage and semantic relationships among medical entities. To the best of our knowledge, most studies have focused on improving BioNER task performance by, on the one hand, combining several features extracted from the text (for instance, linguistic, morphological, character embedding, and word embedding itself) and, on the other, testing several state-of-the-art named entity recognition algorithms. The latter, however, do not pay great attention to the influence of the word embeddings, and do not facilitate observing their real impact on the BioNER task. For this reason, the present study evaluates three well-known NER algorithms (CRF, BiLSTM, BiLSTM-CRF) with respect to two corpora (DrugBank and MedLine) using two classic word embeddings, GloVe Common Crawl (of the general type) and Pyysalo PM + PMC (specific), as unique features. Furthermore, three contextualized word embeddings (ELMo, Pooled Flair, and Transformer) are compared in their general and specific versions. The aim is to determine whether general embeddings can perform better than specialized ones on the BioNER task. To this end, four experiments were designed. In the first, we set out to identify the combination of classic word embedding, NER algorithm, and corpus that results in the best performance. The second evaluated the effect of the size of the corpus on performance. The third assessed the semantic cohesiveness of the classic word embeddings and their correlation with respect to several gold standards; while the fourth evaluates the performance of general and specific contextualized word embeddings on the BioNER task. Results show that the classic general word embedding GloVe Common Crawl performed better in the DrugBank corpus, despite having less word coverage and a lower internal semantic relationship than the classic specific word embedding, Pyysalo PM + PMC; while in the contextualized word embeddings the best results are presented in the specific ones. We conclude, therefore, when using classic word embeddings as features on the BioNER task, the general ones could be considered a good option. On the other hand, when using contextualized word embeddings, the specific ones are the best option.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 50
    Publication Date: 2021-02-03
    Description: Security analysis is an essential activity in security engineering to identify potential system vulnerabilities and specify security requirements in the early design phases. Due to the increasing complexity of modern systems, traditional approaches lack the power to identify insecure incidents caused by complex interactions among physical systems, human and social entities. By contrast, the System-Theoretic Process Analysis for Security (STPA-Sec) approach views losses as resulting from interactions, focuses on controlling system vulnerabilities instead of external threats, and is applicable for complex socio-technical systems. However, the STPA-Sec pays less attention to the non-safety but information-security issues (e.g., data confidentiality) and lacks efficient guidance for identifying information security concepts. In this article, we propose a data-flow-based adaption of the STPA-Sec (named STPA-DFSec) to overcome the mentioned limitations and elicit security constraints systematically. We use the STPA-DFSec and STPA-Sec to analyze a vehicle digital key system and investigate the relationship and differences between both approaches, their applicability, and highlights. To conclude, the proposed approach can identify information-related problems more directly from the data processing aspect. As an adaption of the STPA-Sec, it can be used with other STPA-based approaches to co-design systems in multi-disciplines under the unified STPA framework.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 51
    Publication Date: 2021-03-15
    Description: Motivation Ribosome Profiling (Ribo-seq) has revolutionized the study of RNA translation by providing information on ribosome positions across all translated RNAs with nucleotide-resolution. Yet several technical limitations restrict the sequencing depth of such experiments, the most common of which is the overabundance of rRNA fragments. Various strategies can be employed to tackle this issue, including the use of commercial rRNA depletion kits. However, as they are designed for more standardized RNAseq experiments, they may perform suboptimally in Ribo-seq. In order to overcome this, it is possible to use custom biotinylated oligos complementary to the most abundant rRNA fragments, however currently no computational framework exists to aid the design of optimal oligos. Results Here, we first show that a major confounding issue is that the rRNA fragments generated via Ribo-seq vary significantly with differing experimental conditions, suggesting that a “one-size-fits-all” approach may be inefficient. Therefore we developed Ribo-ODDR, an oligo design pipeline integrated with a user-friendly interface that assists in oligo selection for efficient experiment-specific rRNA depletion. Ribo-ODDR uses preliminary data to identify the most abundant rRNA fragments, and calculates the rRNA depletion efficiency of potential oligos. We experimentally show that Ribo-ODDR designed oligos outperform commercially available kits and lead to a significant increase in rRNA depletion in Ribo-seq. Availability Ribo-ODDR is freely accessible at https://github.com/fallerlab/Ribo-ODDR Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 52
    Publication Date: 2021-03-11
    Description: Keyword extraction is essential in determining influenced keywords from huge documents as the research repositories are becoming massive in volume day by day. The research community is drowning in data and starving for information. The keywords are the words that describe the theme of the whole document in a precise way by consisting of just a few words. Furthermore, many state-of-the-art approaches are available for keyword extraction from a huge collection of documents and are classified into three types, the statistical approaches, machine learning, and graph-based methods. The machine learning approaches require a large training dataset that needs to be developed manually by domain experts, which sometimes is difficult to produce while determining influenced keywords. However, this research focused on enhancing state-of-the-art graph-based methods to extract keywords when the training dataset is unavailable. This research first converted the handcrafted dataset, collected from impact factor journals into n-grams combinations, ranging from unigram to pentagram and also enhanced traditional graph-based approaches. The experiment was conducted on a handcrafted dataset, and all methods were applied on it. Domain experts performed the user study to evaluate the results. The results were observed from every method and were evaluated with the user study using precision, recall and f-measure as evaluation matrices. The results showed that the proposed method (FNG-IE) performed well and scored near the machine learning approaches score.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 53
    Publication Date: 2021-03-15
    Description: Summary Many experimental approaches have been developed to identify transcription start sites (TSS) from genomic scale data. However, experiment specific biases lead to large numbers of false-positive calls. Here, we present our integrative approach iTiSS, which is an accurate and generic TSS caller for any TSS profiling experiment in eukaryotes, and substantially reduces the number of false positives by a joint analysis of several complementary datasets. Availability and implementation iTiSS is platform independent and implemented in Java (v1.8) and is freely available at https://www.erhard-lab.de/software and https://github.com/erhard-lab/iTiSS. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 54
    Publication Date: 2021-03-15
    Description: Summary Once folded, natural protein molecules have few energetic conflicts within their polypeptide chains. Many protein structures do however contain regions where energetic conflicts remain after folding, i.e. they are highly frustrated. These regions, kept in place over evolutionary and physiological timescales, are related to several functional aspects of natural proteins such as protein–protein interactions, small ligand recognition, catalytic sites and allostery. Here, we present FrustratometeR, an R package that easily computes local energetic frustration on a personal computer or a cluster. This package facilitates large scale analysis of local frustration, point mutants and molecular dynamics (MD) trajectories, allowing straightforward integration of local frustration analysis into pipelines for protein structural analysis. Availability and implementation https://github.com/proteinphysiologylab/frustratometeR. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 55
    Publication Date: 2021-02-19
    Description: Summary MitoFlex is a linux-based mitochondrial genome analysis toolkit, which provides a complete workflow of raw data filtering, de novo assembly, mitochondrial genome identification and annotation for animal high throughput sequencing data. The overall performance was compared between MitoFlex and its analogue MitoZ, in terms of protein coding gene recovery, memory consumption and processing speed. Availability MitoFlex is available at https://github.com/Prunoideae/MitoFlex under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 56
    Publication Date: 2021-02-18
    Description: Cervical intraepithelial neoplasia (CIN) and cervical cancer are major health problems faced by women worldwide. The conventional Papanicolaou (Pap) smear analysis is an effective method to diagnose cervical pre-malignant and malignant conditions by analyzing swab images. Various computer vision techniques can be explored to identify potential precancerous and cancerous lesions by analyzing the Pap smear image. The majority of existing work cover binary classification approaches using various classifiers and Convolution Neural Networks. However, they suffer from inherent challenges for minute feature extraction and precise classification. We propose a novel methodology to carry out the multiclass classification of cervical cells from Whole Slide Images (WSI) with optimum feature extraction. The actualization of Conv Net with Transfer Learning technique substantiates meaningful Metamorphic Diagnosis of neoplastic and pre-neoplastic lesions. As the Progressive Resizing technique (an advanced method for training ConvNet) incorporates prior knowledge of the feature hierarchy and can reuse old computations while learning new ones, the model can carry forward the extracted morphological cell features to subsequent Neural Network layers iteratively for elusive learning. The Progressive Resizing technique superimposition in consultation with the Transfer Learning technique while training the Conv Net models has shown a substantial performance increase. The proposed binary and multiclass classification methodology succored in achieving benchmark scores on the Herlev Dataset. We achieved singular multiclass classification scores for WSI images of the SIPaKMed dataset, that is, accuracy (99.70%), precision (99.70%), recall (99.72%), F-Beta (99.63%), and Kappa scores (99.31%), which supersede the scores obtained through principal methodologies. GradCam based feature interpretation extends enhanced assimilation of the generated results, highlighting the pre-malignant and malignant lesions by visual localization in the images.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 57
    Publication Date: 2021-02-03
    Description: In sports competitions, depending on the conditions such as excitement, stress, fatigue, etc. during the match, negative situations such as disability or loss of life may occur for players and spectators. Therefore, it is extremely important to constantly check their health. In addition, some strategic analyzes are made during the match. According to the results of these analyzes, the technical team affects the course of the match. Effects can have positive and sometimes negative results. In this article, fog computing and an Internet of Things (IoT) based architecture are proposed to produce new technical strategies and to avoid disabilities. Players and spectators are monitored with sensors such as blood pressure, body temperature, heart rate, location etc. The data obtained from the sensors are processed in the fog layer and the resulting information is sent to the devices of the technical team and club doctors. In the architecture based on fog computing and IoT, priority processes are computed with low latency. For this, a task management algorithm based on priority queue and list of fog nodes is modified in the fog layer. Authentication and data confidentiality are provided with the Federated Lightweight Authentication of Things (FLAT) method used in the proposed model. In addition, using the Software Defined Network controller based on blockchain technology ensures data integrity.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 58
    Publication Date: 2021-02-03
    Description: Cryptocurrencies such as Bitcoin (BTC) have seen a surge in value in the recent past and appeared as a useful investment opportunity for traders. However, their short term profitability using algorithmic trading strategies remains unanswered. In this work, we focus on the short term profitability of BTC against the euro and the yen for an eight-year period using seven trading algorithms over trading periods of length 15 and 30 days. We use the classical buy and hold (BH) as a benchmark strategy. Rather surprisingly, we found that on average, the yen is more profitable than BTC and the euro; however the answer also depends on the choice of algorithm. Reservation price algorithms result in 7.5% and 10% of average returns over 15 and 30 days respectively which is the highest for all the algorithms for the three assets. For BTC, all algorithms outperform the BH strategy. We also analyze the effect of transaction fee on the profitability of algorithms for BTC and observe that for trading period of length 15 no trading strategy is profitable for BTC. For trading period of length 30, only two strategies are profitable.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 59
    Publication Date: 2021-03-29
    Description: The high volatility of an asset in financial markets is commonly seen as a negative factor. However short-term trades may entail high profits if traders open and close the correct positions. The high volatility of cryptocurrencies, and in particular of Bitcoin, is what made cryptocurrency trading so profitable in these last years. The main goal of this work is to compare several frameworks each other to predict the daily closing Bitcoin price, investigating those that provide the best performance, after a rigorous model selection by the so-called k-fold cross validation method. We evaluated the performance of one stage frameworks, based only on one machine learning technique, such as the Bayesian Neural Network, the Feed Forward and the Long Short Term Memory Neural Networks, and that of two stages frameworks formed by the neural networks just mentioned in cascade to Support Vector Regression. Results highlight higher performance of the two stages frameworks with respect to the correspondent one stage frameworks, but for the Bayesian Neural Network. The one stage framework based on Bayesian Neural Network has the highest performance and the order of magnitude of the mean absolute percentage error computed on the predicted price by this framework is in agreement with those reported in recent literature works.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 60
    Publication Date: 2021-03-19
    Description: Motivation Breast cancer is one of the leading causes of cancer deaths among women worldwide. It is necessary to develop new breast cancer drugs because of the shortcomings of existing therapies. The traditional discovery process is time-consuming and expensive. Repositioning of clinically approved drugs has emerged as a novel approach for breast cancer therapy. However, serendipitous or experiential repurposing cannot be used as a routine method. Results In this study, we proposed a graph neural network model GraphRepur based on GraphSAGE for drug repurposing against breast cancer. GraphRepur integrated two major classes of computational methods, drug network-based and drug signature-based. The differentially expressed genes of disease, drug-exposure gene expression data, and the drug-drug links information were collected. By extracting the drug signatures and topological structure information contained in the drug relationships, GraphRepur can predict new drugs for breast cancer, outperforming previous state-of-the-art approaches and some classic machine learning methods. The high-ranked drugs have indeed been reported as new uses for breast cancer treatment recently. Availability The source code of our model and datasets are available at: https://github.com/cckamy/GraphRepur and https://figshare.com/articles/software/GraphRepur_Breast_Cancer_Drug_Repurposing/14220050 Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 61
    Publication Date: 2021-03-18
    Description: Summary LocusZoom.js is a JavaScript library for creating interactive web-based visualizations of genetic association study results. It can display one or more traits in the context of relevant biological data (such as gene models and other genomic annotation), and allows interactive refinement of analysis models (by selecting linkage disequilibrium reference panels, identifying sets of likely causal variants, or comparisons to the GWAS catalog). It can be embedded in web pages to enable data sharing and exploration. Views can be customized and extended to display other data types such as phenome-wide association study (PheWAS) results, chromatin co-accessibility, or eQTL measurements. A new web upload service harmonizes datasets, adds annotations, and makes it easy to explore user-provided result sets. Availability and implementation LocusZoom.js is open-source software under a permissive MIT license. Code and documentation are available at: https://github.com/statgen/locuszoom/. Installable packages for all versions are also distributed via NPM. Additional features are provided as standalone libraries to promote reuse. Use with your own GWAS results at https://my.locuszoom.org/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 62
    Publication Date: 2021-03-19
    Description: Biometrics recognition takes advantage of feature extraction and pattern recognition to analyze the physical and behavioral characteristics of biological individuals to achieve the purpose of individual identification. As a typical biometric technology, palm print and palm vein have the characteristics of high recognition rate, stable features, easy location and good image quality, which have attracted the attention of researchers. This paper designs and develops a multispectral palm print and palm vein acquisition platform, which can quickly acquire palm spectrum and palm vein multispectral images with seven different wavelengths. We propose a multispectral palm print palmar vein recognition framework, and feature-level image fusion is performed after extracting features of palm print palmar vein images at different wavelengths. Through the multispectral palm print palm vein image fusion experiment, a more feasible multispectral palm print and palm vein image fusion scheme is proposed. Based on the results of image fusion, we further propose an improved convolutional neural network (CNN) for model training to achieve identity recognition based on multispectral palm print palm vein images. Finally, the effects of different CNN network structures and learning rates on the recognition results were analyzed and compared experimentally.
    Print ISSN: 0010-4620
    Electronic ISSN: 1460-2067
    Topics: Computer Science
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 63
    Publication Date: 2021-03-25
    Description: Firms face an increasingly complex economic and financial environment in which the access to international networks and markets is crucial. To be successful, companies need to understand the role of internationalization determinants such as bilateral psychic distance, experience, etc. Cutting-edge feature selection methods are applied in the present paper and compared to previous results to gain deep knowledge about strategies for Foreign Direct Investment. More precisely, evolutionary feature selection, addressed from the wrapper approach, is applied with two different classifiers as the fitness function: Bagged Trees and Extreme Learning Machines. The proposed intelligent system is validated when applied to real-life data from Spanish Multinational Enterprises (MNEs). These data were extracted from databases belonging to the Spanish Ministry of Industry, Tourism, and Trade. As a result, interesting conclusions are derived about the key features driving to the internationalization of the companies under study. This is the first time that such outcomes are obtained by an intelligent system on internationalization data.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 64
    Publication Date: 2021-03-31
    Description: In this article we prove the following results: (i) Every hemimaximal set has minimal $c_{1}$-degree, i.e. if $B$ is hemimaximal and $A$ is a c.e. set such that $A le _{c_{1}} B$ then either $B leq _{{c}_{1}} A$ or $A$ is computable. (ii) The $sQ$-degree of a c.e. set contains either only one or infinitely many c.e. $c$-degrees. (iii) If $A,B$ are c.e. cylinders in the same $sQ_{1}$-degree and $A
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 65
    Publication Date: 2021-03-17
    Description: As a promising next-generation network architecture, named data networking (NDN) supports name-based routing and in-network caching to retrieve content in an efficient, fast, and reliable manner. Most of the studies on NDN have proposed innovative and efficient caching mechanisms and retrieval of content via efficient routing. However, very few studies have targeted addressing the vulnerabilities in NDN architecture, which a malicious node can exploit to perform a content poisoning attack (CPA). This potentially results in polluting the in-network caches, the routing of content, and consequently isolates the legitimate content in the network. In the past, several efforts have been made to propose the mitigation strategies for the content poisoning attack, but to the best of our knowledge, no specific work has been done to address an emerging attack-surface in NDN, which we call an interest flooding attack. Handling this attack-surface can potentially make content poisoning attack mitigation schemes more effective, secure, and robust. Hence, in this article, we propose the addition of a security mechanism in the CPA mitigation scheme that is, Name-Key Based Forwarding and Multipath Forwarding Based Inband Probe, in which we block the malicious face of compromised consumers by monitoring the Cache-Miss Ratio values and the Queue Capacity at the Edge Routers. The malicious face is blocked when the cache-miss ratio hits the threshold value, which is adjusted dynamically through monitoring the cache-miss ratio and queue capacity values. The experimental results show that we are successful in mitigating the vulnerability of the CPA mitigation scheme by detecting and blocking the flooding interface, at the cost of very little verification overhead at the NDN Routers.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 66
    Publication Date: 2021-03-14
    Description: Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 67
    Publication Date: 2021-03-14
    Description: Summary In this article, we introduce a hierarchical clustering and Gaussian mixture model with expectation-maximization (EM) algorithm for detecting copy number variants (CNVs) using whole exome sequencing (WES) data. The R shiny package ‘HCMMCNVs’ is also developed for processing user-provided bam files, running CNVs detection algorithm and conducting visualization. Through applying our approach to 325 cancer cell lines in 22 tumor types from Cancer Cell Line Encyclopedia (CCLE), we show that our algorithm is competitive with other existing methods and feasible in using multiple cancer cell lines for CNVs estimation. In addition, by applying our approach to WES data of 120 oral squamous cell carcinoma (OSCC) samples, our algorithm, using the tumor sample only, exhibits more power in detecting CNVs as compared with the methods using both tumors and matched normal counterparts. Availability and implementation HCMMCNVs R shiny software is freely available at github repository https://github.com/lunching/HCMM_CNVs.and Zenodo https://doi.org/10.5281/zenodo.4593371. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 68
    Publication Date: 2021-03-14
    Description: Summary The need for an efficient and cost-effective method is compelling in biomolecular NMR. To tackle this problem, we have developed the Poky suite, the revolutionized platform with boundless possibilities for advancing research and technology development in signal detection, resonance assignment, structure calculation, and relaxation studies with the help of many automation and user interface tools. This software is extensible and scalable by scripting and batching as well as providing modern graphical user interfaces and a diverse range of modules right out of the box. Availability Poky is freely available to non-commercial users at https://poky.clas.ucdenver.edu. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 69
    Publication Date: 2021-03-17
    Description: Motivation Breast cancer is a very heterogeneous disease and there is an urgent need to design computational methods that can accurately predict the prognosis of breast cancer for appropriate therapeutic regime. Recently, deep learning-based methods have achieved great success in prognosis prediction, but many of them directly combine features from different modalities that may ignore the complex inter-modality relations. In addition, existing deep learning-based methods do not take intra-modality relations into consideration that are also beneficial to prognosis prediction. Therefore, it is of great importance to develop a deep learning-based method that can take advantage of the complementary information between intra-modality and inter-modality by integrating data from different modalities for more accurate prognosis prediction of breast cancer. Results We present a novel unified framework named genomic and pathological deep bilinear network (GPDBN) for prognosis prediction of breast cancer by effectively integrating both genomic data and pathological images. In GPDBN, an inter-modality bilinear feature encoding module is proposed to model complex inter-modality relations for fully exploiting intrinsic relationship of the features across different modalities. Meanwhile, intra-modality relations that are also beneficial to prognosis prediction, are captured by two intra-modality bilinear feature encoding modules. Moreover, to take advantage of the complementary information between inter-modality and intra-modality relations, GPDBN further combines the inter- and intra-modality bilinear features by using a multi-layer deep neural network for final prognosis prediction. Comprehensive experiment results demonstrate that the proposed GPDBN significantly improves the performance of breast cancer prognosis prediction and compares favorably with existing methods. Availabilityand implementation GPDBN is freely available at https://github.com/isfj/GPDBN. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 70
    Publication Date: 2021-03-17
    Description: Motivation Co-expression networks are a powerful gene expression analysis method to study how genes co-express together in clusters with functional coherence that usually resemble specific cell type behaviour for the genes involved. They can be applied to bulk-tissue gene expression profiling and assign function, and usually cell type specificity, to a high percentage of the gene pool used to construct the network. One of the limitations of this method is that each gene is predicted to play a role in a specific set of coherent functions in a single cell type (i.e. at most we get a single for each gene). We present here GMSCA (Gene Multifunctionality Secondary Co-expression Analysis), a software tool that exploits the co-expression paradigm to increase the number of functions and cell types ascribed to a gene in bulk-tissue co-expression networks. Results We applied GMSCA to 27 co-expression networks derived from bulk-tissue gene expression profiling of a variety of brain tissues. Neurons and glial cells (microglia, astrocytes and oligodendrocytes) were considered the main cell types. Applying this approach, we increase the overall number of predicted triplets by 46.73%. Moreover, GMSCA predicts that the SNCA gene, traditionally associated to work mainly in neurons, also plays a relevant function in oligodendrocytes. Availability The tool is available at GitHub, https://github.com/drlaguna/GMSCA as open-source software. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 71
    Publication Date: 2021-03-18
    Description: Summary MMseqs2 taxonomy is a new tool to assign taxonomic labels to metagenomic contigs. It extracts all possible protein fragments from each contig, quickly retains those that can contribute to taxonomic annotation, assigns them with robust labels and determines the contig’s taxonomic identity by weighted voting. Its fragment extraction step is suitable for the analysis of all domains of life. MMseqs2 taxonomy is 2–18× faster than state-of-the-art tools and also contains new modules for creating and manipulating taxonomic reference databases as well as reporting and visualizing taxonomic assignments. Availability and implementation MMseqs2 taxonomy is part of the MMseqs2 free open-source software package available for Linux, macOS and Windows at https://mmseqs.com. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 72
    Publication Date: 2021-03-16
    Description: Motivation The post-transcriptional epigenetic modification on mRNA is an emerging field to study the gene regulatory mechanism and their association with diseases. Recently developed high-throughput sequencing technology named Methylated RNA Immunoprecipitation Sequencing (MeRIP-seq) enables one to profile mRNA epigenetic modification transcriptome-wide. A few computational methods are available to identify transcriptome-wide mRNA modification, but they are either limited by over-simplified model ignoring the biological variance across replicates or suffer from low accuracy and efficiency. Results In this work, we develop a novel statistical method, based on an empirical Bayesian hierarchical model, to identify mRNA epigenetic modification regions from MeRIP-seq data. Our method accounts for various sources of variations in the data through rigorous modeling, and applies shrinkage estimation by borrowing informations from transcriptome-wide data to stabilize the parameter estimation. Simulation and real data analyses demonstrate that our method is more accurate, robust and efficient than the existing peak calling methods. Availability Our method TRES is implemented as an R package and is freely available on Github at https://github.com/ZhenxingGuo0015/TRES. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 73
    Publication Date: 2021-03-09
    Description: The Alternating Direction Method of Multipliers (ADMM) is a popular and promising distributed framework for solving large-scale machine learning problems. We consider decentralized consensus-based ADMM in which nodes may only communicate with one-hop neighbors. This may cause slow convergence. We investigate the impact of network topology on the performance of an ADMM-based learning of Support Vector Machine using expander, and mean-degree graphs, and additionally some of the common modern network topologies. In particular, we investigate to which degree the expansion property of the network influences the convergence in terms of iterations, training and communication time. We furthermore suggest which topology is preferable. Additionally, we provide an implementation that makes these theoretical advances easily available. The results show that the performance of decentralized ADMM-based learning of SVMs in terms of convergence is improved using graphs with large spectral gaps, higher and homogeneous degrees.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 74
    Publication Date: 2021-03-29
    Description: We propose an internal calculus to check the satisfiability of a set of formulas in ${ oldsymbol {S4}}$. Our calculus directly supports model extraction and is designed so to implement a forward proof-search strategy that can be understood as a top-down construction of a model. We prove that the extracted models have minimal height.
    Print ISSN: 0955-792X
    Electronic ISSN: 1465-363X
    Topics: Computer Science , Mathematics
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 75
    Publication Date: 2021-03-08
    Description: This article presents an approach to solve the inverse kinematics of cooperative mobile manipulators for coordinate manipulation tasks. A self-adaptive differential evolution algorithm is used to solve the inverse kinematics as a global constrained optimization problem. A kinematics model of the cooperative mobile manipulators system is proposed, considering a system with two omnidirectional platform manipulators with n DOF. An objective function is formulated based on the forward kinematics equations. Consequently, the proposed approach does not suffer from singularities because it does not require the inversion of any Jacobian matrix. The design of the objective function also contains penalty functions to handle the joint limits constraints. Simulation experiments are performed to test the proposed approach for solving coordinate path tracking tasks. The solutions of the inverse kinematics show precise and accurate results. The experimental setup considers two mobile manipulators based on the KUKA Youbot system to demonstrate the applicability of the proposed approach.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 76
    Publication Date: 2021-03-09
    Description: Due to the increasing size and complexity of many current software systems, the architectural design of these systems has become a considerately complicated task. In this scenario, reference architectures have already proven to be very relevant to support the architectural design of systems in diverse critical application domains, such as health, avionics, transportation, and the automotive sector. However, these architectures are described in many different approaches, such as using textual description, informal models, and even modeling languages as UML. Hence, practitioners are faced with a difficult decision of the better approaches to describing reference architectures. The main contribution of this work is to depict a detailed panorama containing the state of the art (from the literature) and state of the practice (based on existing reference architectures) of approaches for describing reference architectures. For this, we firstly examined the existing approaches (e.g., processes, methods, models, and modeling languages) and compared them concerning completeness and applicability. We also examined four well-known, successful reference architectures (AUTOSAR, ARC-IT, IIRA, and AXMEDIS) in view of the approaches used to describe them. As a result, there exists a misalignment between the state of the art and state of the practice, requiring an engagement of the software architecture community, through research collaboration of academia and industry, to propose more suitable means to describe reference architectures and, as a consequence, promoting the sustainability of these architectures.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 77
    Publication Date: 2021-03-02
    Description: Motivation Identifying meaningful cancer driver genes in a cohort of tumors is a challenging task in cancer genomics. Although existing studies have identified known cancer drivers, most of them focus on detecting coding drivers with mutations. It is acknowledged that non-coding drivers can regulate driver mutations to promote cancer growth. In this work, we propose a novel node importance based network analysis (NIBNA) framework to detect coding and non-coding cancer drivers. We hypothesize that cancer drivers are crucial to the formation of community structures in cancer network, and removing them from the network greatly perturbs the network structure thereby critically affecting the functioning of the network. NIBNA detects cancer drivers using a three-step process; first, a condition-specific network is built by incorporating gene expression data and gene networks, second, the community structures in the network are estimated and third, a centrality-based metric is applied to compute node importance. Results We apply NIBNA to the BRCA dataset and it outperforms existing state-of-art methods in detecting coding cancer drivers. NIBNA also predicts 265 miRNA drivers and majority of these drivers have been validated in literature. Further we apply NIBNA to detect cancer subtype-specific drivers and several predicted drivers have been validated to be associated with cancer subtypes. Lastly, we evaluate NIBNA’s performance in detecting epithelial-mesenchymal transition (EMT) drivers, and we confirmed 8 coding and 13 miRNA drivers in the list of known genes. Availability The source code can be accessed at: https://github.com/mandarsc/NIBNA. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 78
    Publication Date: 2021-03-08
    Description: Motivation A common way to summarize sequencing datasets is to quantify data lying within genes or other genomic intervals. This can be slow and can require different tools for different input file types. Results Megadepth is a fast tool for quantifying alignments and coverage for BigWig and BAM/CRAM input files, using substantially less memory than the next-fastest competitor. Megadepth can summarize coverage within all disjoint intervals of the Gencode V35 gene annotation for more than 19 000 GTExV8 BigWig files in approximately 1 h using 32 threads. Megadepth is available both as a command-line tool and as an R/Bioconductor package providing much faster quantification compared to the rtracklayer package. Availability and implementation https://github.com/ChristopherWilks/megadepth, https://bioconductor.org/packages/megadepth. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 79
    Publication Date: 2021-03-10
    Description: Motivation Bio-entity Coreference Resolution focuses on identifying the coreferential links in biomedical texts, which is crucial to complete bio-events’ attributes and interconnect events into bio-networks. Previously, as one of the most powerful tools, deep neural network-based general domain systems are applied to the biomedical domain with domain-specific information integration. However, such methods may raise much noise due to its insufficiency of combining context and complex domain-specific information. Results In this paper, we explore how to leverage the external knowledge base in a fine-grained way to better resolve coreference by introducing a knowledge-enhanced Long Short Term Memory network (LSTM), which is more flexible to encode the knowledge information inside the LSTM. Moreover, we further propose a knowledge attention module to extract informative knowledge effectively based on contexts. The experimental results on the BioNLP and CRAFT datasets achieve state-of-the-art performance, with a gain of 7.5 F1 on BioNLP and 10.6 F1 on CRAFT. Additional experiments also demonstrate superior performance on the cross-sentence coreferences. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 80
    Publication Date: 2021-03-15
    Description: Motivation Adverse drug–drug interactions (DDIs) are crucial for drug research and mainly cause morbidity and mortality. Thus, the identification of potential DDIs is essential for doctors, patients and the society. Existing traditional machine learning models rely heavily on handcraft features and lack generalization. Recently, the deep learning approaches that can automatically learn drug features from the molecular graph or drug-related network have improved the ability of computational models to predict unknown DDIs. However, previous works utilized large labeled data and merely considered the structure or sequence information of drugs without considering the relations or topological information between drug and other biomedical objects (e.g. gene, disease and pathway), or considered knowledge graph (KG) without considering the information from the drug molecular structure. Results Accordingly, to effectively explore the joint effect of drug molecular structure and semantic information of drugs in knowledge graph for DDI prediction, we propose a multi-scale feature fusion deep learning model named MUFFIN. MUFFIN can jointly learn the drug representation based on both the drug-self structure information and the KG with rich bio-medical information. In MUFFIN, we designed a bi-level cross strategy that includes cross- and scalar-level components to fuse multi-modal features well. MUFFIN can alleviate the restriction of limited labeled data on deep learning models by crossing the features learned from large-scale KG and drug molecular graph. We evaluated our approach on three datasets and three different tasks including binary-class, multi-class and multi-label DDI prediction tasks. The results showed that MUFFIN outperformed other state-of-the-art baselines. Availability and implementation The source code and data are available at https://github.com/xzenglab/MUFFIN.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 81
    Publication Date: 2021-03-12
    Description: Extreme learning machine (ELM) algorithm is widely used in regression and classification problems due to its advantages such as speed and high-performance rate. Different artificial intelligence-based optimization methods and chaotic systems have been proposed for the development of the ELM. However, a generalized solution method and success rate at the desired level could not be obtained. In this study, a new method is proposed as a result of developing the ELM algorithm used in regression problems with discrete-time chaotic systems. ELM algorithm has been improved by testing five different chaotic maps (Chebyshev, iterative, logistic, piecewise, tent) from chaotic systems. The proposed discrete-time chaotic systems based ELM (DCS-ELM) algorithm has been tested in steel fiber reinforced self-compacting concrete data sets and public four different datasets, and a result of its performance compared with the basic ELM algorithm, linear regression, support vector regression, kernel ELM algorithm and weighted ELM algorithm. It has been observed that it gives a better performance than other algorithms.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 82
    Publication Date: 2021-03-12
    Description: Breast cancer is one of the leading causes of death in the current age. It often results in subpar living conditions for a patient as they have to go through expensive and painful treatments to fight this cancer. One in eight women all over the world is affected by this disease. Almost half a million women annually do not survive this fight and die from this disease. Machine learning algorithms have proven to outperform all existing solutions for the prediction of breast cancer using models built on the previously available data. In this paper, a novel approach named BCD-WERT is proposed that utilizes the Extremely Randomized Tree and Whale Optimization Algorithm (WOA) for efficient feature selection and classification. WOA reduces the dimensionality of the dataset and extracts the relevant features for accurate classification. Experimental results on state-of-the-art comprehensive dataset demonstrated improved performance in comparison with eight other machine learning algorithms: Support Vector Machine (SVM), Random Forest, Kernel Support Vector Machine, Decision Tree, Logistic Regression, Stochastic Gradient Descent, Gaussian Naive Bayes and k-Nearest Neighbor. BCD-WERT outperformed all with the highest accuracy rate of 99.30% followed by SVM achieving 98.60% accuracy. Experimental results also reveal the effectiveness of feature selection techniques in improving prediction accuracy.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 83
    Publication Date: 2021-03-15
    Description: Motivation A major challenge in analyzing cancer patient transcriptomes is that the tumors are inherently heterogeneous and evolving. We analyzed 214 bulk RNA samples of a longitudinal, prospective ovarian cancer cohort and found that the sample composition changes systematically due to chemotherapy and between the anatomical sites, preventing direct comparison of treatment-naive and treated samples. Results To overcome this, we developed PRISM, a latent statistical framework to simultaneously extract the sample composition and cell-type-specific whole-transcriptome profiles adapted to each individual sample. Our results indicate that the PRISM-derived composition-free transcriptomic profiles and signatures derived from them predict the patient response better than the composite raw bulk data. We validated our findings in independent ovarian cancer and melanoma cohorts, and verified that PRISM accurately estimates the composition and cell-type-specific expression through whole-genome sequencing and RNA in situ hybridization experiments. Availabilityand implementation https://bitbucket.org/anthakki/prism. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 84
    Publication Date: 2021-03-08
    Description: Motivation Investigating the relationships between two sets of variables helps to understand their interactions and can be done with canonical correlation analysis (CCA). However, the correlation between the two sets can sometimes depend on a third set of covariates, often subject-related ones such as age, gender or other clinical measures. In this case, applying CCA to the whole population is not optimal and methods to estimate conditional CCA, given the covariates, can be useful. Results We propose a new method called Random Forest with Canonical Correlation Analysis (RFCCA) to estimate the conditional canonical correlations between two sets of variables given subject-related covariates. The individual trees in the forest are built with a splitting rule specifically designed to partition the data to maximize the canonical correlation heterogeneity between child nodes. We also propose a significance test to detect the global effect of the covariates on the relationship between two sets of variables. The performance of the proposed method and the global significance test is evaluated through simulation studies that show it provides accurate canonical correlation estimations and well-controlled Type-1 error. We also show an application of the proposed method with EEG data. Availability and implementation RFCCA is implemented in a freely available R package on CRAN (https://CRAN.R-project.org/package=RFCCA). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 85
    Publication Date: 2021-03-10
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 86
    Publication Date: 2021-03-14
    Description: Motivation Polypharmacy side effects should be carefully considered for new drug development. However, considering all the complex drug–drug interactions that cause polypharma-cy side effects is challenging. Recently, graph neural network (GNN) models have handled these complex interactions successfully and shown great predictive perfor-mance. Nevertheless, the GNN models have difficulty providing intelligible factors of the prediction for biomedical and pharmaceutical domain experts. Method A novel approach, graph feature attention network (GFAN), is presented for inter-pretable prediction of polypharmacy side effects by emphasizing target genes differ-ently. To artificially simulate polypharmacy situations, where two different drugs are taken together, we formulated a node classification problem by using the concept of line graph in graph theory. Results Experiments with benchmark datasets validated interpretability of the GFAN and demonstrated competitive performance with the graph attention network in a previous work. And the specific cases in the polypharmacy side effect prediction experiments showed that the GFAN model is capable of very sensitively extracting the target genes for each side effect prediction. Availability and implementation https://github.com/SunjooBang/Polypharmacy-side-effect-prediction
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 87
    Publication Date: 2021-03-26
    Description: This article proposes a novel network model to achieve better accurate residual binarized convolutional neural networks (CNNs), denoted as AresB-Net. Even though residual CNNs enhance the classification accuracy of binarized neural networks with increasing feature resolution, the degraded classification accuracy is still the primary concern compared with real-valued residual CNNs. AresB-Net consists of novel basic blocks to amortize the severe error from the binarization, suggesting a well-balanced pyramid structure without downsampling convolution. In each basic block, the shortcut is added to the convolution output and then concatenated, and then the expanded channels are shuffled for the next grouped convolution. In the downsampling when stride 〉1, our model adopts only the max-pooling layer for generating low-cost shortcut. This structure facilitates the feature reuse from the previous layers, thus alleviating the error from the binarized convolution and increasing the classification accuracy with reduced computational costs and small weight storage requirements. Despite low hardware costs from the binarized computations, the proposed model achieves remarkable classification accuracies on the CIFAR and ImageNet datasets.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 88
    Publication Date: 2021-03-18
    Description: Robotic systems are generally used for grasping, carrying, holding, and many similar operations, typically in industrial applications. One of the most important components of robotic systems is robot grippers for the aforementioned operations, which are not only mission-critical but also represent a significant operational cost due to the time and expense associated with replacement. Grasping operations require sensitive and dexterous manipulation ability. As a consequence, tactile materials and sensors are an essential element in effective robot grippers; however, to date, little effort has been invested in the optimization of these systems. This study has set out to develop inexpensive, easily replaced pads, testing two different chemical compositions that are used to produce a tactile material for robot grippers, with the objective of generating cost, time, and environmental savings. Each tactile material produced has its specific individual dimension and weight. First, each of the materials under construction was tested under different constant pressures, and its characteristics were analyzed. Second, each tactile material was mounted on a two-fingered robot gripper and its characteristics. Material characteristics were tested and analyzed as regards their ability to grasp different sizes and types of objects using the two-fingered robot gripper. Based on the analysis of the results the most sensitive and cost-effective material for industrial type multi-fingered grippers was identified.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 89
    Publication Date: 2021-02-19
    Description: Artificial intelligence techniques have been used in the industry to control complex systems; among these proposals, adaptive Proportional, Integrative, Derivative (PID) controllers are intelligent versions of the most used controller in the industry. This work presents an adaptive neuron PD controller and a multilayer neural PD controller for position tracking of a mobile manipulator. Both controllers are trained by an extended Kalman filter (EKF) algorithm. Neural networks trained with the EKF algorithm show faster learning speeds and convergence times than the training based on backpropagation. The integrative term in PID controllers eliminates the steady-state error, but it provokes oscillations and overshoot. Moreover, the cumulative error in the integral action may produce windup effects such as high settling time, poor performance, and instability. The proposed neural PD controllers adjust their gains dynamically, which eliminates the steady-state error. Then, the integrative term is not required, and oscillations and overshot are highly reduced. Removing the integral part also eliminates the need for anti-windup methodologies to deal with the windup effects. Mobile manipulators are popular due to their mobile capability combined with a dexterous manipulation capability, which gives them the potential for many industrial applications. Applicability of the proposed adaptive neural controllers is presented by simulating experimental results on a KUKA Youbot mobile manipulator, presenting different tests and comparisons with the conventional PID controller and an existing adaptive neuron PID controller.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 90
    Publication Date: 2021-02-10
    Description: Provocative heart disease is related to ventricular arrhythmias (VA). Ventricular tachyarrhythmia is an irregular and fast heart rhythm that emerges from inappropriate electrical impulses in the ventricles of the heart. Different types of arrhythmias are associated with different patterns, which can be identified. An electrocardiogram (ECG) is the major analytical tool used to interpret and record ECG signals. ECG signals are nonlinear and difficult to interpret and analyze. We propose a new deep learning approach for the detection of VA. Initially, the ECG signals are transformed into images that have not been done before. Later, these images are normalized and utilized to train the AlexNet, VGG-16 and Inception-v3 deep learning models. Transfer learning is performed to train a model and extract the deep features from different output layers. After that, the features are fused by a concatenation approach, and the best features are selected using a heuristic entropy calculation approach. Finally, supervised learning classifiers are utilized for final feature classification. The results are evaluated on the MIT-BIH dataset and achieved an accuracy of 97.6% (using Cubic Support Vector Machine as a final stage classifier).
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 91
    Publication Date: 2021-02-10
    Description: Background and Purpose COVID-19 is a new strain of viruses that causes life stoppage worldwide. At this time, the new coronavirus COVID-19 is spreading rapidly across the world and poses a threat to people’s health. Experimental medical tests and analysis have shown that the infection of lungs occurs in almost all COVID-19 patients. Although Computed Tomography of the chest is a useful imaging method for diagnosing diseases related to the lung, chest X-ray (CXR) is more widely available, mainly due to its lower price and results. Deep learning (DL), one of the significant popular artificial intelligence techniques, is an effective way to help doctors analyze how a large number of CXR images is crucial to performance. Materials and Methods In this article, we propose a novel perceptual two-layer image fusion using DL to obtain more informative CXR images for a COVID-19 dataset. To assess the proposed algorithm performance, the dataset used for this work includes 87 CXR images acquired from 25 cases, all of which were confirmed with COVID-19. The dataset preprocessing is needed to facilitate the role of convolutional neural networks (CNN). Thus, hybrid decomposition and fusion of Nonsubsampled Contourlet Transform (NSCT) and CNN_VGG19 as feature extractor was used. Results Our experimental results show that imbalanced COVID-19 datasets can be reliably generated by the algorithm established here. Compared to the COVID-19 dataset used, the fuzed images have more features and characteristics. In evaluation performance measures, six metrics are applied, such as QAB/F, QMI, PSNR, SSIM, SF, and STD, to determine the evaluation of various medical image fusion (MIF). In the QMI, PSNR, SSIM, the proposed algorithm NSCT + CNN_VGG19 achieves the greatest and the features characteristics found in the fuzed image is the largest. We can deduce that the proposed fusion algorithm is efficient enough to generate CXR COVID-19 images that are more useful for the examiner to explore patient status. Conclusions A novel image fusion algorithm using DL for an imbalanced COVID-19 dataset is the crucial contribution of this work. Extensive results of the experiment display that the proposed algorithm NSCT + CNN_VGG19 outperforms competitive image fusion algorithms.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 92
    Publication Date: 2021-02-09
    Description: A model regarding the lifetime of individual source code lines or tokens can estimate maintenance effort, guide preventive maintenance, and, more broadly, identify factors that can improve the efficiency of software development. We present methods and tools that allow tracking of each line’s or token’s birth and death. Through them, we analyze 3.3 billion source code element lifetime events in 89 revision control repositories. Statistical analysis shows that code lines are durable, with a median lifespan of about 2.4 years, and that young lines are more likely to be modified or deleted, following a Weibull distribution with the associated hazard rate decreasing over time. This behavior appears to be independent from specific characteristics of lines or tokens, as we could not determine factors that influence significantly their longevity across projects. The programing language, and developer tenure and experience were not found to be significantly correlated with line or token longevity, while project size and project age showed only a slight correlation.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 93
    Publication Date: 2021-02-09
    Description: Motivation The prediction performance of Cox proportional hazard model suffers when there are only few uncensored events in the training data. Results We propose a Sparse-Group regularized Cox regression method to improve the prediction performance of large-scale and high-dimensional survival data with few observed events. Our approach is applicable when there is one or more other survival responses that 1. has a large number of observed events; 2. share a common set of associated predictors with the rare event response. This scenario is common in the UK Biobank (Sudlow et al., 2015) dataset where records for a large number of common and less prevalent diseases of the same set of individuals are available. By analyzing these responses together, we hope to achieve higher prediction performance than when they are analyzed individually. To make this approach practical for large-scale data, we developed an accelerated proximal gradient optimization algorithm as well as a screening procedure inspired by Qian et al. (2020). Availability https://github.com/rivas-lab/multisnpnet-Cox Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 94
    Publication Date: 2021-02-09
    Description: Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 95
    Publication Date: 2021-02-09
    Description: As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors arising due to faults in multiple devices and/or data-bus, address bus. In this article, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD)—a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a ×4 based DDR4 system. Simulation based experiments show that our scheme effectively guards against device, data-bus and address-bus errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank, 76 data bus-lines and additional hash-logic at the memory controller.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 96
    Publication Date: 2021-02-10
    Description: We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are hardware accelerators for mixed-precision matrix multiplication available on the Volta, Turing, and Ampere microarchitectures. Using Volta V100, Turing T4, and Ampere A100 graphics cards, we determine what precision is used for the intermediate results, whether subnormal numbers are supported, what rounding mode is used, in which order the operations underlying the matrix multiplication are performed, and whether partial sums are normalized. These aspects are not documented by NVIDIA, and we gain insight by running carefully designed numerical experiments on these hardware units. Knowing the answers to these questions is important if one wishes to: (1) accurately simulate NVIDIA tensor cores on conventional hardware; (2) understand the differences between results produced by code that utilizes tensor cores and code that uses only IEEE 754-compliant arithmetic operations; and (3) build custom hardware whose behavior matches that of NVIDIA tensor cores. As part of this work we provide a test suite that can be easily adapted to test newer versions of the NVIDIA tensor cores as well as similar accelerators from other vendors, as they become available. Moreover, we identify a non-monotonicity issue affecting floating point multi-operand adders if the intermediate results are not normalized after each step.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 97
    Publication Date: 2021-02-12
    Description: Summary Searching for open reading frames is a routine task and a critical step prior to annotating protein coding regions in newly sequenced genomes or de novo transcriptome assemblies. With the tremendous increase in genomic and transcriptomic data, faster tools are needed to handle large input datasets. These tools should be versatile enough to fine-tune search criteria and allow efficient downstream analysis. Here we present a new python based tool, orfipy, which allows the user to flexibly search for open reading frames in genomic and transcriptomic sequences. The search is rapid and is fully customizable, with a choice of FASTA and BED output formats. Availability and implementation orfipy is implemented in python and is compatible with python v3.6 and higher. Source code: https://github.com/urmi-21/orfipy. Installation: from the source, or via PyPi (https://pypi.org/project/orfipy) or bioconda (https://anaconda.org/bioconda/orfipy). Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 98
    Publication Date: 2021-02-16
    Description: Steganalysis is the process of analyzing and predicting the presence of hidden information in images. Steganalysis would be most useful to predict whether the received images contain useful information. However, it is more difficult to predict the hidden information in images which is computationally difficult. In the existing research method, this is resolved by introducing the deep learning approach which attempts to perform steganalysis tasks in effectively. However, this research method does not concentrate the noises present in the images. It might increase the computational overhead where the error cost adjustment would require more iteration. This is resolved in the proposed research technique by introducing the novel research method called Non-Gaussian Noise Aware Auto Encoder Convolutional Neural Network (NGN-AEDNN). Classification technique provides a more flexible way for steganalysis where the multiple features present in the environment would lead to an inaccurate prediction rate. Here, learning accuracy is improved by introducing noise removal techniques before performing a learning task. Non-Gaussian Noise Removal technique is utilized to remove the noises before learning. Also, Gaussian noise removal is applied at every iteration of the neural network to adjust the error rate without the involvement of noisy features. This proposed work can ensure efficient steganalysis by accurate learning task. Matlab has been employed to implement the method by performing simulations from which it is proved that the proposed research technique NGN-AEDNN can ensure the efficient steganalysis outcome with the reduced computational overhead when compared with the existing methods.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 99
    Publication Date: 2021-02-17
    Description: Background Parkinson’s disease (PD) is a neurodegenerative condition of the central nervous system that causes motor and non-motor dysfunctions. The disease affects 1% of the world population over 60 years and remains cureless. Knowledge and monitoring of PD are essential to provide better living conditions for patients. Thus, diagnostic exams and monitoring of the disease can generate a large amount of data from a given patient. This study proposes the development and usability evaluation of an integrated system, which can be used in clinical and research settings to manage biomedical data collected from PD patients. Methods A system, so-called Sistema Integrado de Dados Biomédicos (SIDABI) (Integrated Biomedical Data System), was designed following the model-view-controller (MVC) standard. A modularized architecture was created in which all the other modules are connected to a central security module. Thirty-six examiners evaluated the system usability through the System Usability Scale (SUS). The agreement between examiners was measured by Kendall’s coefficient with a significance level of 1%. Results The free and open-source web-based system was implemented using modularized and responsive methods to adapt the system features on multiple platforms. The mean SUS score was 82.99 ± 13.97 points. The overall agreement was 70.2%, as measured by Kendall’s coefficient (p 〈 0.001). Conclusion According to the SUS scores, the developed system has good usability. The system proposed here can help researchers to organize and share information, avoiding data loss and fragmentation. Furthermore, it can help in the follow-up of PD patients, in the training of professionals involved in the treatment of the disorder, and in studies that aim to find hidden correlations in data.
    Electronic ISSN: 2376-5992
    Topics: Computer Science
    Published by PeerJ
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
  • 100
    Publication Date: 2021-02-15
    Description: Summary Despite the continuous discovery of new transcript isoforms, fueled by the recent increase in accessibility and accuracy of long-read RNA sequencing data, functional differences between isoforms originating from the same gene often remain obscure. To address this issue and enable researchers to assess potential functional consequences of transcript isoform variation on the proteome, we developed IsoTV. IsoTV is a versatile pipeline to process, predict, and visualize the functional features of translated transcript isoforms. Attributes such as gene and isoform expression, transcript composition, and functional features are summarized in an easy-to-interpret visualization. IsoTV is able to analyze a variety of data types from all eukaryotic organisms, including short- and long-read RNA-seq data. Using Oxford Nanopore long read data, we demonstrate that IsoTV facilitates the understanding of potential protein isoform function in different cancer cell types. Availability IsoTV is available at https://github.molgen.mpg.de/MayerGroup/IsoTV, with the corresponding documentation at https://isotv.readthedocs.io/. Supplementary information Supplementary data are available at Bioinformatics online.
    Print ISSN: 1367-4803
    Electronic ISSN: 1460-2059
    Topics: Biology , Computer Science , Medicine
    Location Call Number Expected Availability
    BibTip Others were also interested in ...
Close ⊗
This website uses cookies and the analysis tool Matomo. More information can be found here...