ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

CLImAT: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data (2014)

Yu, Z., Liu, Y., Shen, Y., Wang, M., Li, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2014-09-06

Description: Motivation: Whole-genome sequencing of tumor samples has been demonstrated as an efficient approach for comprehensive analysis of genomic aberrations in cancer genome. Critical issues such as tumor impurity and aneuploidy, GC-content and mappability bias have been reported to complicate identification of copy number alteration and loss of heterozygosity in complex tumor samples. Therefore, efficient computational methods are required to address these issues. Results: We introduce CLImAT (CNA and LOH Assessment in Impure and Aneuploid Tumors), a bioinformatics tool for identification of genomic aberrations from tumor samples using whole-genome sequencing data. Without requiring a matched normal sample, CLImAT takes integrated analysis of read depth and allelic frequency and provides extensive data processing procedures including GC-content and mappability correction of read depth and quantile normalization of B-allele frequency. CLImAT accurately identifies copy number alteration and loss of heterozygosity even for highly impure tumor samples with aneuploidy. We evaluate CLImAT on both simulated and real DNA sequencing data to demonstrate its ability to infer tumor impurity and ploidy and identify genomic aberrations in complex tumor samples. Availability and implementation: The CLImAT software package can be freely downloaded at http://bioinformatics.ustc.edu.cn/CLImAT/ . Contact : aoli@ustc.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Accurate disulfide-bonding network predictions improve ab initio structure prediction of cysteine-rich proteins (2015)

Yang, J., He, B.-J., Jang, R., Zhang, Y., Shen, H.-B.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-11-21

Description: Motivation: Cysteine-rich proteins cover many important families in nature but there are currently no methods specifically designed for modeling the structure of these proteins. The accuracy of disulfide connectivity pattern prediction, particularly for the proteins of higher-order connections, e.g. 〉3 bonds, is too low to effectively assist structure assembly simulations. Results: We propose a new hierarchical order reduction protocol called Cyscon for disulfide-bonding prediction. The most confident disulfide bonds are first identified and bonding prediction is then focused on the remaining cysteine residues based on SVR training. Compared with purely machine learning-based approaches, Cyscon improved the average accuracy of connectivity pattern prediction by 21.9%. For proteins with more than 5 disulfide bonds, Cyscon improved the accuracy by 585% on the benchmark set of PDBCYS. When applied to 158 non-redundant cysteine-rich proteins, Cyscon predictions helped increase (or decrease) the TM-score (or RMSD) of the ab initio QUARK modeling by 12.1% (or 14.4%). This result demonstrates a new avenue to improve the ab initio structure modeling for cysteine-rich proteins. Availability and implementation: http://www.csbio.sjtu.edu.cn/bioinf/Cyscon/ Contact: zhng@umich.edu or hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling (2013)

Yang, J., Jang, R., Zhang, Y., Shen, H.-B.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2013-10-04

Description: Motivation: Residue–residue contacts across the transmembrane helices dictate the three-dimensional topology of alpha-helical membrane proteins. However, contact determination through experiments is difficult because most transmembrane proteins are hard to crystallize. Results: We present a novel method (MemBrain) to derive transmembrane inter-helix contacts from amino acid sequences by combining correlated mutations and multiple machine learning classifiers. Tested on 60 non-redundant polytopic proteins using a strict leave-one-out cross-validation protocol, MemBrain achieves an average accuracy of 62%, which is 12.5% higher than the current best method from the literature. When applied to 13 recently solved G protein-coupled receptors, the MemBrain contact predictions helped increase the TM-score of the I-TASSER models by 37% in the transmembrane region. The number of foldable cases (TM-score 〉0.5) increased by 100%, where all G protein-coupled receptor templates and homologous templates with sequence identity 〉30% were excluded. These results demonstrate significant progress in contact prediction and a potential for contact-driven structure modeling of transmembrane proteins. Availability: www.csbio.sjtu.edu.cn/bioinf/MemBrain/ Contact: hbshen@sjtu.edu.cn or zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

Bioimaging-based detection of mislocalized proteins in human cancers by semi-supervised learning (2015)

Xu, Y.-Y., Yang, F., Zhang, Y., Shen, H.-B.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-04-03

Description: Motivation: There is a long-term interest in the challenging task of finding translocated and mislocated cancer biomarker proteins. Bioimages of subcellular protein distribution are new data sources which have attracted much attention in recent years because of their intuitive and detailed descriptions of protein distribution. However, automated methods in large-scale biomarker screening suffer significantly from the lack of subcellular location annotations for bioimages from cancer tissues. The transfer prediction idea of applying models trained on normal tissue proteins to predict the subcellular locations of cancerous ones is arbitrary because the protein distribution patterns may differ in normal and cancerous states. Results: We developed a new semi-supervised protocol that can use unlabeled cancer protein data in model construction by an iterative and incremental training strategy. Our approach enables us to selectively use the low-quality images in normal states to expand the training sample space and provides a general way for dealing with the small size of annotated images used together with large unannotated ones. Experiments demonstrate that the new semi-supervised protocol can result in improved accuracy and sensitivity of subcellular location difference detection. Availability and implementation: The data and code are available at: www.csbio.sjtu.edu.cn/bioinf/SemiBiomarker/ . Contact: hbshen@sjtu.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

An image-based multi-label human protein subcellular localization predictor (iLocator) reveals protein mislocalizations in cancer tissues (2013)

Xu, Y.-Y., Yang, F., Zhang, Y., Shen, H.-B.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2013-07-26

Description: Motivation: Human cells are organized into compartments of different biochemical cellular processes. Having proteins appear at the right time to the correct locations in the cellular compartments is required to conduct their functions in normal cells, whereas mislocalization of proteins can result in pathological diseases, including cancer. Results: To reveal the cancer-related protein mislocalizations, we developed an image-based multi-label subcellular location predictor, i Locator, which covers seven cellular localizations. The i Locator incorporates both global and local image descriptors and generates predictions by using an ensemble multi-label classifier. The algorithm has the ability to treat both single- and multiple-location proteins. We first trained and tested i Locator on 3240 normal human tissue images that have known subcellular location information from the human protein atlas. The i Locator was then used to generate protein localization predictions for 3696 protein images from seven cancer tissues that have no location annotations in the human protein atlas. By comparing the output data from normal and cancer tissues, we detected eight potential cancer biomarker proteins that have significant localization differences with P -value 〈 0.01. Availability: http://www.csbio.sjtu.edu.cn/bioinf/iLocator/ Contact: hbshen@sjtu.edu.cn or zhng@umich.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

ASSIGN: context-specific genomic profiling of multiple heterogeneous biological pathways (2015)

Shen, Y., Rahman, M., Piccolo, S. R., Gusenleitner, D., El-Chaar, N. N., Cheng, L., Monti, S., Bild, A. H., Johnson, W. E.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-05-27

Description: Motivation: Although gene-expression signature-based biomarkers are often developed for clinical diagnosis, many promising signatures fail to replicate during validation. One major challenge is that biological samples used to generate and validate the signature are often from heterogeneous biological contexts—controlled or in vitro samples may be used to generate the signature, but patient samples may be used for validation. In addition, systematic technical biases from multiple genome-profiling platforms often mask true biological variation. Addressing such challenges will enable us to better elucidate disease mechanisms and provide improved guidance for personalized therapeutics. Results: Here, we present a pathway profiling toolkit, Adaptive Signature Selection and InteGratioN (ASSIGN), which enables robust and context-specific pathway analyses by efficiently capturing pathway activity in heterogeneous sets of samples and across profiling technologies. The ASSIGN framework is based on a flexible Bayesian factor analysis approach that allows for simultaneous profiling of multiple correlated pathways and for the adaptation of pathway signatures into specific disease. We demonstrate the robustness and versatility of ASSIGN in estimating pathway activity in simulated data, cell lines perturbed pathways and in primary tissues samples including The Cancer Genome Atlas breast carcinoma samples and liver samples exposed to genotoxic carcinogens. Availability and implementation: Software for our approach is available for download at: http://www.bioconductor.org/packages/release/bioc/html/ASSIGN.html and https://github.com/wevanjohnson/ASSIGN . Contact : andreab@genetics.utah.edu or wej@bu.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

cNMA: a framework of encounter complex-based normal mode analysis to model conformational changes in protein interactions (2015)

Oliwa, T., Shen, Y.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-06-14

Description: Motivation: It remains both a fundamental and practical challenge to understand and anticipate motions and conformational changes of proteins during their associations. Conventional normal mode analysis (NMA) based on anisotropic network model (ANM) addresses the challenge by generating normal modes reflecting intrinsic flexibility of proteins, which follows a conformational selection model for protein–protein interactions. But earlier studies have also found cases where conformational selection alone could not adequately explain conformational changes and other models have been proposed. Moreover, there is a pressing demand of constructing a much reduced but still relevant subset of protein conformational space to improve computational efficiency and accuracy in protein docking, especially for the difficult cases with significant conformational changes. Method and results: With both conformational selection and induced fit models considered, we extend ANM to include concurrent but differentiated intra- and inter-molecular interactions and develop an encounter complex-based NMA (cNMA) framework. Theoretical analysis and empirical results over a large data set of significant conformational changes indicate that cNMA is capable of generating conformational vectors considerably better at approximating conformational changes with contributions from both intrinsic flexibility and inter-molecular interactions than conventional NMA only considering intrinsic flexibility does. The empirical results also indicate that a straightforward application of conventional NMA to an encounter complex often does not improve upon NMA for an individual protein under study and intra- and inter-molecular interactions need to be differentiated properly. Moreover, in addition to induced motions of a protein under study, the induced motions of its binding partner and the coupling between the two sets of protein motions present in a near-native encounter complex lead to the improved performance. A study to isolate and assess the sole contribution of intermolecular interactions toward improvements against conventional NMA further validates the additional benefit from induced-fit effects. Taken together, these results provide new insights into molecular mechanisms underlying protein interactions and new tools for dimensionality reduction for flexible protein docking. Availability and implementation: Source codes are available upon request. Contact: yshen@tamu.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

8

Unknown

RPdb: a database of experimentally verified cellular reprogramming records (2015)

Shen, Y., Gao, F., Wang, M., Li, A.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-09-22

Description: : Many cell lines can be reprogrammed to other cell lines by forced expression of a few transcription factors or by specifically designed culture methods, which have attracted a great interest in the field of regenerative medicine and stem cell research. Plenty of cell lines have been used to generate induced pluripotent stem cells (IPSCs) by expressing a group of genes and microRNAs. These IPSCs can differentiate into somatic cells to promote tissue regeneration. Similarly, many somatic cells can be directly reprogrammed to other cells without a stem cell state. All these findings are helpful in searching for new reprogramming methods and understanding the biological mechanism inside. However, to the best of our knowledge, there is still no database dedicated to integrating the reprogramming records. We built RPdb (cellular reprogramming database) to collect cellular reprogramming information and make it easy to access. All entries in RPdb are manually extracted from more than 2000 published articles, which is helpful for researchers in regenerative medicine and cell biology. Availability and Implementation: RPdb is freely available on the web at http://bioinformatics.ustc.edu.cn/rpdb with all major browsers supported. Contact: aoli@ustc.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

9

Unknown

Group spike-and-slab lasso generalized linear models for disease prediction and associated genes detection by incorporating pathway information (2018)

Tang Z, Shen Y, Li Y, et al.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2018-03-14

Description: Motivation Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation The methods have been implemented in a freely available R package BhGLM ( http://www.ssg.uab.edu/bhglm/ ). Contact nyi@uab.edu Supplementary information Supplementary dataSupplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

10

Unknown

ORE identifies extreme expression effects enriched for rare variants (2019)

Richter, F ; Hoffman, G E ; Manheimer, K B ; [et al.]

Oxford University Press

In: Bioinformatics. 2019; 35(20): 3906-3912. Published 2019 Mar 23. doi: 10.1093/bioinformatics/btz202.

add to mindlist on the mindlist

Details

Publication Date: 2019-03-23

Description: Motivation Non-coding rare variants (RVs) may contribute to Mendelian disorders but have been challenging to study due to small sample sizes, genetic heterogeneity and uncertainty about relevant non-coding features. Previous studies identified RVs associated with expression outliers, but varying outlier definitions were employed and no comprehensive open-source software was developed. Results We developed Outlier-RV Enrichment (ORE) to identify biologically-meaningful non-coding RVs. We implemented ORE combining whole-genome sequencing and cardiac RNAseq from congenital heart defect patients from the Pediatric Cardiac Genomics Consortium and deceased adults from Genotype-Tissue Expression. Use of rank-based outliers maximized sensitivity while a most extreme outlier approach maximized specificity. Rarer variants had stronger associations, suggesting they are under negative selective pressure and providing a basis for investigating their contribution to Mendelian disorders. Availability and implementation ORE, source code, and documentation are available at https://pypi.python.org/pypi/ore under the MIT license. Supplementary information Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext