ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

FASTAFS: file system virtualisation of random access compressed FASTA files (2021)

Hoogstrate, Youri ; Jenster, Guido W. ; Werken, Harmen J. G. van de

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 535. Published 2021 Nov 01. doi: 10.1186/s12859-021-04455-3.

add to mindlist on the mindlist

Details

Publication Date: 2021-11-01

Description: Background The FASTA file format, used to store polymeric sequence data, has become a bioinformatics file standard used for decades. The relatively large files require additional files, beyond the scope of the original format, to identify sequences and to provide random access. Multiple compressors have been developed to archive FASTA files back and forth, but these lack direct access to targeted content or metadata of the archive. Moreover, these solutions are not directly backwards compatible to FASTA files, resulting in limited software integration. Results We designed a linux based toolkit that virtualises the content of DNA, RNA and protein FASTA archives into the filesystem by using filesystem in userspace. This guarantees in-sync virtualised metadata files and offers fast random-access decompression using bit encodings plus Zstandard (zstd). The toolkit, FASTAFS, can track all its system-wide running instances, allows file integrity verification and can provide, instantly, scriptable access to sequence files and is easy to use and deploy. The file compression ratios were comparable but not superior to other state of the art archival tools, despite the innovative random access feature implemented in FASTAFS. Conclusions FASTAFS is a user-friendly and easy to deploy backwards compatible generic purpose solution to store and access compressed FASTA files, since it offers file system access to FASTA files as well as in-sync metadata files through file virtualisation. Using virtual filesystems as in-between layer offers format conversion without the need to rewrite code into different programming languages while preserving compatibility.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Development of a prognostic signature of patients with esophagus adenocarcinoma by using immune-related genes (2021)

Zhang, Xiangxin ; Yang, Liu ; Kong, Ming ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 536. Published 2021 Nov 01. doi: 10.1186/s12859-021-04456-2.

add to mindlist on the mindlist

Details

Publication Date: 2021-11-01

Description: Background Esophageal adenocarcinoma (EAC) is an aggressive malignancy with a poor prognosis. The immune-related genes (IRGs) are crucial to immunocytes tumor infiltration. This study aimed to construct a IRG-related prediction signature in EAC. Methods The related data of EAC patients and IRGs were obtained from the TCGA and ImmPort database, respectively. The cox regression analysis constructed the prediction signature and explored the transcription factors regulatory network through the Cistrome database. TIMER database and CIBERSORT analytical tool were utilized to explore the immunocytes infiltration analysis. Results The prediction signature with 12 IRGs (ADRM1, CXCL1, SEMG1, CCL26, CCL24, AREG, IL23A, UCN2, FGFR4, IL17RB, TNFRSF11A, and TNFRSF21) was constructed. Overall survival (OS) curves indicate that the survival rate of the high-risk group is significantly shorter than the low-risk group (P = 7.26e−07), and the AUC of 1-, 3- and 5- year survival prediction rates is 0.871, 0.924, and 0.961, respectively. Compared with traditional features, the ROC curve of the risk score in the EAC patients (0.967) is significant than T (0.57), N (0.738), M (0.568), and Stage (0.768). Moreover, multivariate Cox analysis and Nomogram of risk score are indicated that the 1-year and 3-year survival rates of patients are accurate by the combined analysis of the risk score, Sex, M stage, and Stage (The AUC of 1- and 3-years are 0.911, and 0.853). Conclusion The 12 prognosis-related IRGs might be promising therapeutic targets for EAC.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

A drug repositioning algorithm based on a deep autoencoder and adaptive fusion (2021)

Chen, Peng ; Bao, Tianjiazhi ; Yu, Xiaosheng ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 532. Published 2021 Oct 30. doi: 10.1186/s12859-021-04406-y.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-30

Description: Background Drug repositioning has caught the attention of scholars at home and abroad due to its effective reduction of the development cost and time of new drugs. However, existing drug repositioning methods that are based on computational analysis are limited by sparse data and classic fusion methods; thus, we use autoencoders and adaptive fusion methods to calculate drug repositioning. Results In this study, a drug repositioning algorithm based on a deep autoencoder and adaptive fusion was proposed to mitigate the problems of decreased precision and low-efficiency multisource data fusion caused by data sparseness. Specifically, a drug is repositioned by fusing drug-disease associations, drug target proteins, drug chemical structures and drug side effects. First, drug feature data integrated by drug target proteins and chemical structures were processed with dimension reduction via a deep autoencoder to characterize feature representations more densely and abstractly. Then, disease similarity was computed using drug-disease association data, while drug similarity was calculated with drug feature and drug-side effect data. Predictions of drug-disease associations were also calculated using a top-k neighbor method that is commonly used in predictive drug repositioning studies. Finally, a predicted matrix for drug-disease associations was acquired after fusing a wide variety of data via adaptive fusion. Based on experimental results, the proposed algorithm achieves a higher precision and recall rate than the DRCFFS, SLAMS and BADR algorithms with the same dataset. Conclusion The proposed algorithm contributes to investigating the novel uses of drugs, as shown in a case study of Alzheimer's disease. Therefore, the proposed algorithm can provide an auxiliary effect for clinical trials of drug repositioning.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

LongStitch: high-quality genome assembly correction and scaffolding using long reads (2021)

Coombe, Lauren ; Li, Janet X. ; Lo, Theodora ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 534. Published 2021 Oct 30. doi: 10.1186/s12859-021-04451-7.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-30

Description: Background Generating high-quality de novo genome assemblies is foundational to the genomics study of model and non-model organisms. In recent years, long-read sequencing has greatly benefited genome assembly and scaffolding, a process by which assembled sequences are ordered and oriented through the use of long-range information. Long reads are better able to span repetitive genomic regions compared to short reads, and thus have tremendous utility for resolving problematic regions and helping generate more complete draft assemblies. Here, we present LongStitch, a scalable pipeline that corrects and scaffolds draft genome assemblies exclusively using long reads. Results LongStitch incorporates multiple tools developed by our group and runs in up to three stages, which includes initial assembly correction (Tigmint-long), followed by two incremental scaffolding stages (ntLink and ARKS-long). Tigmint-long and ARKS-long are misassembly correction and scaffolding utilities, respectively, previously developed for linked reads, that we adapted for long reads. Here, we describe the LongStitch pipeline and introduce our new long-read scaffolder, ntLink, which utilizes lightweight minimizer mappings to join contigs. LongStitch was tested on short and long-read assemblies of Caenorhabditis elegans, Oryza sativa, and three different human individuals using corresponding nanopore long-read data, and improves the contiguity of each assembly from 1.2-fold up to 304.6-fold (as measured by NGA50 length). Furthermore, LongStitch generates more contiguous and correct assemblies compared to state-of-the-art long-read scaffolder LRScaf in most tests, and consistently improves upon human assemblies in under five hours using less than 23 GB of RAM. Conclusions Due to its effectiveness and efficiency in improving draft assemblies using long reads, we expect LongStitch to benefit a wide variety of de novo genome assembly projects. The LongStitch pipeline is freely available at https://github.com/bcgsc/longstitch.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

Filling gaps of genome scaffolds via probabilistic searching optical maps against assembly graph (2021)

Huang, Bin ; Wei, Guozheng ; Wang, Bing ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 533. Published 2021 Oct 30. doi: 10.1186/s12859-021-04448-2.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-30

Description: Background Optical maps record locations of specific enzyme recognition sites within long genome fragments. This long-distance information enables aligning genome assembly contigs onto optical maps and ordering contigs into scaffolds. The generated scaffolds, however, often contain a large amount of gaps. To fill these gaps, a feasible way is to search genome assembly graph for the best-matching contig paths that connect boundary contigs of gaps. The combination of searching and evaluation procedures might be “searching followed by evaluation”, which is infeasible for long gaps, or “searching by evaluation”, which heavily relies on heuristics and thus usually yields unreliable contig paths. Results We here report an accurate and efficient approach to filling gaps of genome scaffolds with aids of optical maps. Using simulated data from 12 species and real data from 3 species, we demonstrate the successful application of our approach in gap filling with improved accuracy and completeness of genome scaffolds. Conclusion Our approach applies a sequential Bayesian updating technique to measure the similarity between optical maps and candidate contig paths. Using this similarity to guide path searching, our approach achieves higher accuracy than the existing “searching by evaluation” strategy that relies on heuristics. Furthermore, unlike the “searching followed by evaluation” strategy enumerating all possible paths, our approach prunes the unlikely sub-paths and extends the highly-probable ones only, thus significantly increasing searching efficiency.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

Analytics and visualization tools to characterize single-cell stochasticity using bacterial single-cell movie cytometry data (2021)

Balomenos, Athanasios D. ; Stefanou, Victoria ; Manolakos, Elias S.

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 531. Published 2021 Oct 29. doi: 10.1186/s12859-021-04409-9.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-29

Description: Background Time-lapse microscopy live-cell imaging is essential for studying the evolution of bacterial communities at single-cell resolution. It allows capturing detailed information about the morphology, gene expression, and spatial characteristics of individual cells at every time instance of the imaging experiment. The image analysis of bacterial "single-cell movies" (videos) generates big data in the form of multidimensional time series of measured bacterial attributes. If properly analyzed, these datasets can help us decipher the bacterial communities' growth dynamics and identify the sources and potential functional role of intra- and inter-subpopulation heterogeneity. Recent research has highlighted the importance of investigating the role of biological "noise" in gene regulation, cell growth, cell division, etc. Single-cell analytics of complex single-cell movie datasets, capturing the interaction of multiple micro-colonies with thousands of cells, can shed light on essential phenomena for human health, such as the competition of pathogens and benign microbiome cells, the emergence of dormant cells (“persisters”), the formation of biofilms under different stress conditions, etc. However, highly accurate and automated bacterial bioimage analysis and single-cell analytics methods remain elusive, even though they are required before we can routinely exploit the plethora of data that single-cell movies generate. Results We present visualization and single-cell analytics using R (ViSCAR), a set of methods and corresponding functions, to visually explore and correlate single-cell attributes generated from the image processing of complex bacterial single-cell movies. They can be used to model and visualize the spatiotemporal evolution of attributes at different levels of the microbial community organization (i.e., cell population, colony, generation, etc.), to discover possible epigenetic information transfer across cell generations, infer mathematical and statistical models describing various stochastic phenomena (e.g., cell growth, cell division), and even identify and auto-correct errors introduced unavoidably during the bioimage analysis of a dense movie with thousands of overcrowded cells in the microscope's field of view. Conclusions ViSCAR empowers researchers to capture and characterize the stochasticity, uncover the mechanisms leading to cellular phenotypes of interest, and decipher a large heterogeneous microbial communities' dynamic behavior. ViSCAR source code is available from GitLab at https://gitlab.com/ManolakosLab/viscar.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

Centrality of drug targets in protein networks (2021)

Viacava Follis, Ariele

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 527. Published 2021 Oct 29. doi: 10.1186/s12859-021-04342-x.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-29

Description: Background In the pharmaceutical industry, competing for few validated drug targets there is a drive to identify new ways of therapeutic intervention. Here, we attempted to define guidelines to evaluate a target’s ‘fitness’ based on its node characteristics within annotated protein functional networks to complement contingent therapeutic hypotheses. Results We observed that targets of approved, selective small molecule drugs exhibit high node centrality within protein networks relative to a broader set of investigational targets spanning various development stages. Targets of approved drugs also exhibit higher centrality than other proteins within their respective functional class. These findings expand on previous reports of drug targets’ network centrality by suggesting some centrality metrics such as low topological coefficient as inherent characteristics of a ‘good’ target, relative to other exploratory targets and regardless of its functional class. These centrality metrics could thus be indicators of an individual protein’s ‘fitness’ as potential drug target. Correlations between protein nodes’ network centrality and number of associated publications underscored the possibility of knowledge bias as an inherent limitation to such predictions. Conclusions Despite some entanglement with knowledge bias, like structure-oriented ‘druggability’ assessments of new protein targets, centrality metrics could assist early pharmaceutical discovery teams in evaluating potential targets with limited experimental proof of concept and help allocate resources for an effective drug discovery pipeline.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

isoCNV: in silico optimization of copy number variant detection from targeted or exome sequencing data (2021)

Barcelona-Cabeza, Rosa ; Sanseverino, Walter ; Aiese Cigliano, Riccardo

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 530. Published 2021 Oct 29. doi: 10.1186/s12859-021-04452-6.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-29

Description: Background Accurate copy number variant (CNV) detection is especially challenging for both targeted sequencing (TS) and whole‐exome sequencing (WES) data. To maximize the performance, the parameters of the CNV calling algorithms should be optimized for each specific dataset. This requires obtaining validated CNV information using either multiplex ligation-dependent probe amplification (MLPA) or array comparative genomic hybridization (aCGH). They are gold standard but time-consuming and costly approaches. Results We present isoCNV which optimizes the parameters of DECoN algorithm using only NGS data. The parameter optimization process is performed using an in silico CNV validated dataset obtained from the overlapping calls of three algorithms: CNVkit, panelcn.MOPS and DECoN. We evaluated the performance of our tool and showed that increases the sensitivity in both TS and WES real datasets. Conclusions isoCNV provides an easy-to-use pipeline to optimize DECoN that allows the detection of analysis-ready CNV from a set of DNA alignments obtained under the same conditions. It increases the sensitivity of DECoN without the need for orthogonal methods. isoCNV is available at https://gitlab.com/sequentiateampublic/isocnv.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

adabmDCA: adaptive Boltzmann machine learning for biological sequences (2021)

Muntoni, Anna Paola ; Pagnani, Andrea ; Weigt, Martin ; [et al.]

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 528. Published 2021 Oct 29. doi: 10.1186/s12859-021-04441-9.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-29

Description: Background Boltzmann machines are energy-based models that have been shown to provide an accurate statistical description of domains of evolutionary-related protein and RNA families. They are parametrized in terms of local biases accounting for residue conservation, and pairwise terms to model epistatic coevolution between residues. From the model parameters, it is possible to extract an accurate prediction of the three-dimensional contact map of the target domain. More recently, the accuracy of these models has been also assessed in terms of their ability in predicting mutational effects and generating in silico functional sequences. Results Our adaptive implementation of Boltzmann machine learning, , can be generally applied to both protein and RNA families and accomplishes several learning set-ups, depending on the complexity of the input data and on the user requirements. The code is fully available at https://github.com/anna-pa-m/adabmDCA. As an example, we have performed the learning of three Boltzmann machines modeling the Kunitz and Beta-lactamase2 protein domains and TPP-riboswitch RNA domain. Conclusions The models learned by are comparable to those obtained by state-of-the-art techniques for this task, in terms of the quality of the inferred contact map as well as of the synthetically generated sequences. In addition, the code implements both equilibrium and out-of-equilibrium learning, which allows for an accurate and lossless training when the equilibrium one is prohibitive in terms of computational time, and allows for pruning irrelevant parameters using an information-based criterion.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

ANAT 3.0: a framework for elucidating functional protein subnetworks using graph-theoretic and machine learning approaches (2021)

Signorini, L. F. ; Almozlino, T. ; Sharan, R.

BioMed Central

In: BMC Bioinformatics. 2021; 22(1): 526. Published 2021 Oct 27. doi: 10.1186/s12859-021-04449-1.

add to mindlist on the mindlist

Details

Publication Date: 2021-10-27

Description: Background ANAT is a Cytoscape plugin for the inference of functional protein–protein interaction networks in yeast and human. It is a flexible graphical tool for scientists to explore and elucidate the protein–protein interaction pathways of a process under study. Results Here we present ANAT3.0, which comes with updated PPI network databases of 544,455 (human) and 155,504 (yeast) interactions, and a new machine-learning layer for refined network elucidation. Together they improve network reconstruction to more than twofold increase in the quality of reconstructing known signaling pathways from KEGG. Conclusions ANAT3.0 includes improved network reconstruction algorithms and more comprehensive protein–protein interaction networks than previous versions. ANAT is available for download on the Cytoscape Appstore and at https://www.cs.tau.ac.il/~bnet/ANAT/.

Electronic ISSN: 1471-2105

Topics: Biology , Computer Science

Published by BioMed Central

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext