ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Discovering a sparse set of pairwise discriminating features in high-dimensional data (2020)

Melton, Samuel ; Ramanathan, Sharad

Oxford University Press

In: Bioinformatics. 2020; Published 2020 Jul 30. doi: 10.1093/bioinformatics/btaa690. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-07-30

Description: Motivation Recent technological advances produce a wealth of high-dimensional descriptions of biological processes, yet extracting meaningful insight and mechanistic understanding from these data remains challenging. For example, in developmental biology, the dynamics of differentiation can now be mapped quantitatively using single-cell RNA sequencing, yet it is difficult to infer molecular regulators of developmental transitions. Here, we show that discovering informative features in the data is crucial for statistical analysis as well as making experimental predictions. Results We identify features based on their ability to discriminate between clusters of the data points. We define a class of problems in which linear separability of clusters is hidden in a low-dimensional space. We propose an unsupervised method to identify the subset of features that define a low-dimensional subspace in which clustering can be conducted. This is achieved by averaging over discriminators trained on an ensemble of proposed cluster configurations. We then apply our method to single-cell RNA-seq data from mouse gastrulation, and identify 27 key transcription factors (out of 409 total), 18 of which are known to define cell states through their expression levels. In this inferred subspace, we find clear signatures of known cell types that eluded classification prior to discovery of the correct low-dimensional subspace. Availability and implementation https://github.com/smelton/SMD. Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Classification and review of free PCR primer design software (2020)

Guo, Jingwen ; Starr, David ; Guo, Huazhang

Oxford University Press

In: Bioinformatics. 2020; Published 2020 Oct 26. doi: 10.1093/bioinformatics/btaa910. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-10-26

Description: Motivation Polymerase chain reaction (PCR) has been a revolutionary biomedical advancement. However, for PCR to be appropriately used, one must spend a significant amount of effort on PCR primer design. Carefully designed PCR primers not only increase sensitivity and specificity, but also decrease effort spent on experimental optimization. Computer software removes the human element by performing and automating the complex and rigorous calculations required in PCR primer design. Classification and review of the available software options and their capabilities should be a valuable resource for any PCR application. Results This paper focuses on currently available free PCR primer design software and their major functions (https://pcrprimerdesign.github.io/). The software are classified according to their PCR applications, such as Sanger sequencing, reverse transcription quantitative PCR, single nucleotide polymorphism detection, splicing variant detection, methylation detection, microsatellite detection, multiplex PCR and targeted next generation sequencing, and conserved/degenerate primers to clone orthologous genes from related species, new gene family members in the same species, or to detect a group of related pathogens. Each software is summarized to provide a technical review of their capabilities and utilities.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

The effect of statistical normalisation on network propagation scores (2020)

Picart-Armada, Sergio ; Thompson, Wesley K ; Buil, Alfonso ; [et al.]

Oxford University Press

In: Bioinformatics. 2020; Published 2020 Oct 18. doi: 10.1093/bioinformatics/btaa896. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-10-18

Description: Motivation Network diffusion and label propagation are fundamental tools in computational biology, with applications like gene-disease association, protein function prediction and module discovery. More recently, several publications have introduced a permutation analysis after the propagation process, due to concerns that network topology can bias diffusion scores. This opens the question of the statistical properties and the presence of bias of such diffusion processes in each of its applications. In this work, we characterised some common null models behind the permutation analysis and the statistical properties of the diffusion scores. We benchmarked seven diffusion scores on three case studies: synthetic signals on a yeast interactome, simulated differential gene expression on a protein-protein interaction network and prospective gene set prediction on another interaction network. For clarity, all the datasets were based on binary labels, but we also present theoretical results for quantitative labels. Results Diffusion scores starting from binary labels were affected by the label codification, and exhibited a problem-dependent topological bias that could be removed by the statistical normalisation. Parametric and non-parametric normalisation addressed both points by being codification-independent and by equalising the bias. We identified and quantified two sources of bias -mean value and variance- that yielded performance differences when normalising the scores. We provided closed formulae for both and showed how the null covariance is related to the spectral properties of the graph. Despite none of the proposed scores systematically outperformed the others, normalisation was preferred when the sought positive labels were not aligned with the bias. We conclude that the decision on bias removal should be problem and data-driven, i.e. based on a quantitative analysis of the bias and its relation to the positive entities. Availability The code is publicly available at https://github.com/b2slab/diffuBench Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

AEMDA: inferring miRNA–disease associations based on deep autoencoder (2020)

Ji, Cunmei ; Gao, Zhen ; Ma, Xu ; [et al.]

Oxford University Press

In: Bioinformatics. 2020; Published 2020 Jul 29. doi: 10.1093/bioinformatics/btaa670. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-07-29

Description: Motivation MicroRNAs (miRNAs) are a class of non-coding RNAs that play critical roles in various biological processes. Many studies have shown that miRNAs are closely related to the occurrence, development and diagnosis of human diseases. Traditional biological experiments are costly and time consuming. As a result, effective computational models have become increasingly popular for predicting associations between miRNAs and diseases, which could effectively boost human disease diagnosis and prevention. Results We propose a novel computational framework, called AEMDA, to identify associations between miRNAs and diseases. AEMDA applies a learning-based method to extract dense and high-dimensional representations of diseases and miRNAs from integrated disease semantic similarity, miRNA functional similarity and heterogeneous related interaction data. In addition, AEMDA adopts a deep autoencoder that does not need negative samples to retrieve the underlying associations between miRNAs and diseases. Furthermore, the reconstruction error is used as a measurement to predict disease-associated miRNAs. Our experimental results indicate that AEMDA can effectively predict disease-related miRNAs and outperforms state-of-the-art methods. Availability and implementation The source code and data are available at https://github.com/CunmeiJi/AEMDA. Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

PASSION: an ensemble neural network approach for identifying the binding sites of RBPs on circRNAs (2020)

Jia, Cangzhi ; Bi, Yue ; Chen, Jinxiang ; [et al.]

Oxford University Press

In: Bioinformatics. 2020; 36(15): 4276-4282. Published 2020 May 19. doi: 10.1093/bioinformatics/btaa522.

add to mindlist on the mindlist

Details

Publication Date: 2020-05-19

Description: Motivation Different from traditional linear RNAs (containing 5′ and 3′ ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. Results For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. Availability and implementation A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

COVID-19 Knowledge Graph: a computable, multi-modal, cause-and-effect knowledge model of COVID-19 pathophysiology (2020)

Domingo-Fernández, Daniel ; Baksi, Shounak ; Schultz, Bruce ; [et al.]

Oxford University Press

In: Bioinformatics. 2020; Published 2020 Sep 25. doi: 10.1093/bioinformatics/btaa834. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-09-25

Description: Summary The COVID-19 crisis has elicited a global response by the scientific community that has led to a burst of publications on the pathophysiology of the virus. However, without coordinated efforts to organize this knowledge, it can remain hidden away from individual research groups. By extracting and formalizing this knowledge in a structured and computable form, as in the form of a knowledge graph, researchers can readily reason and analyze this information on a much larger scale. Here, we present the COVID-19 Knowledge Graph, an expansive cause-and-effect network constructed from scientific literature on the new coronavirus that aims to provide a comprehensive view of its pathophysiology. To make this resource available to the research community and facilitate its exploration and analysis, we also implemented a web application and released the KG in multiple standard formats. Availability The COVID-19 Knowledge Graph is publicly available under CC-0 license at https://github.com/covid19kg and https://bikmi.covid19-knowledgespace.de. Supplementary information Supplementary data are available online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

DRIMC: an improved drug repositioning approach using Bayesian inductive matrix completion (2020)

Zhang, Wenjuan ; Xu, Hunan ; Li, Xiaozhong ; [et al.]

Oxford University Press

In: Bioinformatics. 2020; 36(9): 2839-2847. Published 2020 Jan 30. doi: 10.1093/bioinformatics/btaa062.

add to mindlist on the mindlist

Details

Publication Date: 2020-01-30

Description: Motivation One of the most important problems in drug discovery research is to precisely predict a new indication for an existing drug, i.e. drug repositioning. Recent recommendation system-based methods have tackled this problem using matrix completion models. The models identify latent factors contributing to known drug-disease associations, and then infer novel drug-disease associations by the correlations between latent factors. However, these models have not fully considered the various drug data sources and the sparsity of the drug-disease association matrix. In addition, using the global structure of the drug-disease association data may introduce noise, and consequently limit the prediction power. Results In this work, we propose a novel drug repositioning approach by using Bayesian inductive matrix completion (DRIMC). First, we embed four drug data sources into a drug similarity matrix and two disease data sources in a disease similarity matrix. Then, for each drug or disease, its feature is described by similarity values between it and its nearest neighbors, and these features for drugs and diseases are mapped onto a shared latent space. We model the association probability for each drug-disease pair by inductive matrix completion, where the properties of drugs and diseases are represented by projections of drugs and diseases, respectively. As the known drug-disease associations have been manually verified, they are more trustworthy and important than the unknown pairs. We assign higher confidence levels to known association pairs compared with unknown pairs. We perform comprehensive experiments on three benchmark datasets, and DRIMC improves prediction accuracy compared with six stat-of-the-art approaches. Availability and implementation Source code and datasets are available at https://github.com/linwang1982/DRIMC. Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

Using Drug Descriptions and Molecular Structures for Drug-Drug Interaction Extraction from Literature (2020)

Asada, Masaki ; Miwa, Makoto ; Sasaki, Yutaka

Oxford University Press

In: Bioinformatics. 2020; Published 2020 Oct 24. doi: 10.1093/bioinformatics/btaa907. [early online release]

add to mindlist on the mindlist

Details

Publication Date: 2020-10-24

Description: Motivation Neural methods to extract drug-drug interactions (DDIs) from literature require a large number of annotations. In this study, we propose a novel method to effectively utilize external drug database information as well as information from large-scale plain text for DDI extraction. Specifically, we focus on drug description and molecular structure information as the drug database information. Results We evaluated our approach on the DDIExtraction 2013 shared task data set. We obtained the following results. First, large-scale raw text information can greatly improve the performance of extracting DDIs when combined with the existing model and it shows the state-of-the-art performance. Second, each of drug description and molecular structure information is helpful to further improve the DDI performance for some specific DDI types. Finally, the simultaneous use of the drug description and molecular structure information can significantly improve the performance on all the DDI types. We showed that the plain text, the drug description information, and molecular structure information are complementary and their effective combination are essential for the improvement. Availability https://github.com/tticoin/DESC_MOL-DDIE

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

scHLAcount: allele-specific HLA expression from single-cell gene expression data (2020)

Darby, Charlotte A ; Stubbington, Michael J T ; Marks, Patrick J ; [et al.]

Oxford University Press

In: Bioinformatics. 2020; 36(12): 3905-3906. Published 2020 Apr 24. doi: 10.1093/bioinformatics/btaa264.

add to mindlist on the mindlist

Details

Publication Date: 2020-04-24

Description: Summary Bulk RNA sequencing studies have demonstrated that human leukocyte antigen (HLA) genes may be expressed in a cell type-specific and allele-specific fashion. Single-cell gene expression assays have the potential to further resolve these expression patterns, but currently available methods do not perform allele-specific quantification at the molecule level. Here, we present scHLAcount, a post-processing workflow for single-cell RNA-seq data that computes allele-specific molecule counts of the HLA genes based on a personalized reference constructed from the sample’s HLA genotypes. Availability and implementation scHLAcount is available under the MIT license at https://github.com/10XGenomics/scHLAcount. Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

FUpred: detecting protein domains through deep-learning-based contact map prediction (2020)

Zheng, Wei ; Zhou, Xiaogen ; Wuyun, Qiqige ; [et al.]

Oxford University Press

In: Bioinformatics. 2020; 36(12): 3749-3757. Published 2020 Mar 30. doi: 10.1093/bioinformatics/btaa217.

add to mindlist on the mindlist

Details

Publication Date: 2020-03-30

Description: Motivation Protein domains are subunits that can fold and function independently. Correct domain boundary assignment is thus a critical step toward accurate protein structure and function analyses. There is, however, no efficient algorithm available for accurate domain prediction from sequence. The problem is particularly challenging for proteins with discontinuous domains, which consist of domain segments that are separated along the sequence. Results We developed a new algorithm, FUpred, which predicts protein domain boundaries utilizing contact maps created by deep residual neural networks coupled with coevolutionary precision matrices. The core idea of the algorithm is to retrieve domain boundary locations by maximizing the number of intra-domain contacts, while minimizing the number of inter-domain contacts from the contact maps. FUpred was tested on a large-scale dataset consisting of 2549 proteins and generated correct single- and multi-domain classifications with a Matthew’s correlation coefficient of 0.799, which was 19.1% (or 5.3%) higher than the best machine learning (or threading)-based method. For proteins with discontinuous domains, the domain boundary detection and normalized domain overlapping scores of FUpred were 0.788 and 0.521, respectively, which were 17.3% and 23.8% higher than the best control method. The results demonstrate a new avenue to accurately detect domain composition from sequence alone, especially for discontinuous, multi-domain proteins. Availability and implementation https://zhanglab.ccmb.med.umich.edu/FUpred. Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext