ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Filtering data from high-throughput experiments based on measurement reliability (2010)

Talloen, W. ; Hochreiter, S. ; Bijnens, L. ; [et al.]

National Academy of Sciences

In: PNAS - Proceedings of the National Academy of Sciences of the United States of America. 2010; 107(46): E173-E174. Published 2010 Nov 08. doi: 10.1073/pnas.1010604107.

add to mindlist on the mindlist

Details

Publication Date: 2010-11-08

Print ISSN: 0027-8424

Electronic ISSN: 1091-6490

Topics: Biology , Medicine , Natural Sciences in General

Published by National Academy of Sciences

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Rchemcpp: a web service for structural analoging in ChEMBL, Drugbank and the Connectivity Map (2015)

Klambauer, G., Wischenbart, M., Mahr, M., Unterthiner, T., Mayr, A., Hochreiter, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-10-08

Description: : We have developed Rchempp, a web service that identifies structurally similar compounds (structural analogs) in large-scale molecule databases. The service allows compounds to be queried in the widely used ChEMBL, DrugBank and the Connectivity Map databases. Rchemcpp utilizes the best performing similarity functions, i.e. molecule kernels, as measures for structural similarity. Molecule kernels have proven superior performance over other similarity measures and are currently excelling at machine learning challenges. To considerably reduce computational time, and thereby make it feasible as a web service, a novel efficient prefiltering strategy has been developed, which maintains the sensitivity of the method. By exploiting information contained in public databases, the web service facilitates many applications crucial for the drug development process, such as prioritizing compounds after screening or reducing adverse side effects during late phases. Rchemcpp was used in the DeepTox pipeline that has won the Tox21 Data Challenge and is frequently used by researchers in pharmaceutical companies. Availability and implementation: The web service and the R package are freely available via http://shiny.bioinf.jku.at/Analoging/ and via Bioconductor. Contact: hochreit@bioinf.jku.at Supplementary information : Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

msa: an R package for multiple sequence alignment (2015)

Bodenhofer, U., Bonatesta, E., Horejs-Kainrath, C., Hochreiter, S.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-12-10

Description: : Although the R platform and the add-on packages of the Bioconductor project are widely used in bioinformatics, the standard task of multiple sequence alignment has been neglected so far. The msa package, for the first time, provides a unified R interface to the popular multiple sequence alignment algorithms ClustalW, ClustalOmega and MUSCLE. The package requires no additional software and runs on all major platforms. Moreover, the msa package provides an R interface to the powerful package shade which allows for flexible and customizable plotting of multiple sequence alignments. Availability and implementation: msa is available via the Bioconductor project: http://bioconductor.org/packages/release/bioc/html/msa.html . Further information and the R code of the example presented in this paper are available at http://www.bioinf.jku.at/software/msa/ . Contact: bodenhofer@bioinf.jku.at or msa@bioinf.jku.at

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

4

Unknown

cn.MOPS: mixture of Poissons for discovering copy number variations in next-generation sequencing data with a low false discovery rate (2012)

Klambauer, G., Schwarzbauer, K., Mayr, A., Clevert, D.-A., Mitterecker, A., Bodenhofer, U., Hochreiter, S.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2012-05-13

Description: Quantitative analyses of next-generation sequencing (NGS) data, such as the detection of copy number variations (CNVs), remain challenging. Current methods detect CNVs as changes in the depth of coverage along chromosomes. Technological or genomic variations in the depth of coverage thus lead to a high false discovery rate (FDR), even upon correction for GC content. In the context of association studies between CNVs and disease, a high FDR means many false CNVs, thereby decreasing the discovery power of the study after correction for multiple testing. We propose ‘Copy Number estimation by a Mixture Of PoissonS’ (cn.MOPS), a data processing pipeline for CNV detection in NGS data. In contrast to previous approaches, cn.MOPS incorporates modeling of depths of coverage across samples at each genomic position. Therefore, cn.MOPS is not affected by read count variations along chromosomes. Using a Bayesian approach, cn.MOPS decomposes variations in the depth of coverage across samples into integer copy numbers and noise by means of its mixture components and Poisson distributions, respectively. The noise estimate allows for reducing the FDR by filtering out detections having high noise that are likely to be false detections. We compared cn.MOPS with the five most popular methods for CNV detection in NGS data using four benchmark datasets: (i) simulated data, (ii) NGS data from a male HapMap individual with implanted CNVs from the X chromosome, (iii) data from HapMap individuals with known CNVs, (iv) high coverage data from the 1000 Genomes Project. cn.MOPS outperformed its five competitors in terms of precision (1–FDR) and recall for both gains and losses in all benchmark data sets. The software cn.MOPS is publicly available as an R package at http://www.bioinf.jku.at/software/cnmops/ and at Bioconductor.

Keywords: Computational Methods, Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

5

Unknown

HapFABIA: Identification of very short segments of identity by descent characterized by rare variants in large sequencing data (2013)

Hochreiter, S.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-12-07

Description: Identity by descent (IBD) can be reliably detected for long shared DNA segments, which are found in related individuals. However, many studies contain cohorts of unrelated individuals that share only short IBD segments. New sequencing technologies facilitate identification of short IBD segments through rare variants, which convey more information on IBD than common variants. Current IBD detection methods, however, are not designed to use rare variants for the detection of short IBD segments. Short IBD segments reveal genetic structures at high resolution. Therefore, they can help to improve imputation and phasing, to increase genotyping accuracy for low-coverage sequencing and to increase the power of association studies. Since short IBD segments are further assumed to be old, they can shed light on the evolutionary history of humans. We propose HapFABIA, a computational method that applies biclustering to identify very short IBD segments characterized by rare variants. HapFABIA is designed to detect short IBD segments in genotype data that were obtained from next-generation sequencing, but can also be applied to DNA microarray data. Especially in next-generation sequencing data, HapFABIA exploits rare variants for IBD detection. HapFABIA significantly outperformed competing algorithms at detecting short IBD segments on artificial and simulated data with rare variants. HapFABIA identified 160 588 different short IBD segments characterized by rare variants with a median length of 23 kb (mean 24 kb) in data for chromosome 1 of the 1000 Genomes Project. These short IBD segments contain 752 000 single nucleotide variants (SNVs), which account for 39% of the rare variants and 23.5% of all variants. The vast majority—152 000 IBD segments—are shared by Africans, while only 19 000 and 11 000 are shared by Europeans and Asians, respectively. IBD segments that match the Denisova or the Neandertal genome are found significantly more often in Asians and Europeans but also, in some cases exclusively, in Africans. The lengths of IBD segments and their sharing between continental populations indicate that many short IBD segments from chromosome 1 existed before humans migrated out of Africa. Thus, rare variants that tag these short IBD segments predate human migration from Africa. The software package HapFABIA is available from Bioconductor. All data sets, result files and programs for data simulation, preprocessing and evaluation are supplied at http://www.bioinf.jku.at/research/short-IBD .

Keywords: Genomics

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

6

Unknown

DEXUS: identifying differential expression in RNA-Seq studies with unknown conditions (2013)

Klambauer, G., Unterthiner, T., Hochreiter, S.

Oxford University Press

In: Nucleic Acids Research

add to mindlist on the mindlist

Details

Publication Date: 2013-11-21

Description: Detection of differential expression in RNA-Seq data is currently limited to studies in which two or more sample conditions are known a priori. However, these biological conditions are typically unknown in cohort, cross-sectional and nonrandomized controlled studies such as the HapMap, the ENCODE or the 1000 Genomes project. We present DEXUS for detecting differential expression in RNA-Seq data for which the sample conditions are unknown. DEXUS models read counts as a finite mixture of negative binomial distributions in which each mixture component corresponds to a condition. A transcript is considered differentially expressed if modeling of its read counts requires more than one condition. DEXUS decomposes read count variation into variation due to noise and variation due to differential expression. Evidence of differential expression is measured by the informative/noninformative (I/NI) value, which allows differentially expressed transcripts to be extracted at a desired specificity (significance level) or sensitivity (power). DEXUS performed excellently in identifying differentially expressed transcripts in data with unknown conditions. On 2400 simulated data sets, I/NI value thresholds of 0.025, 0.05 and 0.1 yielded average specificities of 92, 97 and 99% at sensitivities of 76, 61 and 38%, respectively. On real-world data sets, DEXUS was able to detect differentially expressed transcripts related to sex, species, tissue, structural variants or quantitative trait loci. The DEXUS R package is publicly available from Bioconductor and the scripts for all experiments are available at http://www.bioinf.jku.at/software/dexus/ .

Keywords: Computational Methods, Massively Parallel (Deep) Sequencing

Print ISSN: 0305-1048

Electronic ISSN: 1362-4962

Topics: Biology

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

7

Unknown

KeBABS: an R package for kernel-based analysis of biological sequences (2015)

Palme, J., Hochreiter, S., Bodenhofer, U.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-07-26

Description: : KeBABS provides a powerful, flexible and easy to use framework for ke rnel- b ased a nalysis of b iological s equences in R. It includes efficient implementations of the most important sequence kernels, also including variants that allow for taking sequence annotations and positional information into account. KeBABS seamlessly integrates three common support vector machine (SVM) implementations with a unified interface. It allows for hyperparameter selection by cross validation, nested cross validation and also features grouped cross validation. The biological interpretation of SVM models is supported by (1) the computation of weights of sequence patterns and (2) prediction profiles that highlight the contributions of individual sequence positions or sections. Availability and implementation: The R package kebabs is available via the Bioconductor project: http://bioconductor.org/packages/release/bioc/html/kebabs.html . Further information and the R code of the example in this paper are available at http://www.bioinf.jku.at/software/kebabs/ . Contact: kebabs@bioinf.jku.at or bodenhofer@bioinf.jku.at

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink