ALBERT — All Library Books, journals and Electronic Records Telegrafenberg

1

Unknown

Positive and negative forms of replicability in gene network analysis (2016)

Verleyen, W., Ballouz, S., Gillis, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2016-03-26

Description: Motivation: Gene networks have become a central tool in the analysis of genomic data but are widely regarded as hard to interpret. This has motivated a great deal of comparative evaluation and research into best practices. We explore the possibility that this may lead to overfitting in the field as a whole. Results: We construct a model of ‘research communities’ sampling from real gene network data and machine learning methods to characterize performance trends. Our analysis reveals an important principle limiting the value of replication, namely that targeting it directly causes ‘easy’ or uninformative replication to dominate analyses. We find that when sampling across network data and algorithms with similar variability, the relationship between replicability and accuracy is positive (Spearman’s correlation, r s ~0.33) but where no such constraint is imposed, the relationship becomes negative for a given gene function ( r s ~ –0.13). We predict factors driving replicability in some prior analyses of gene networks and show that they are unconnected with the correctness of the original result, instead reflecting replicable biases. Without these biases, the original results also vanish replicably. We show these effects can occur quite far upstream in network data and that there is a strong tendency within protein–protein interaction data for highly replicable interactions to be associated with poor quality control. Availability and implementation: Algorithms, network data and a guide to the code available at: https://github.com/wimverleyen/AggregateGeneFunctionPrediction . Contact: jgillis@cshl.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

2

Unknown

Guidance for RNA-seq co-expression network construction and analysis: safety in numbers (2015)

Ballouz, S., Verleyen, W., Gillis, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-06-27

Description: Motivation: RNA-seq co-expression analysis is in its infancy and reasonable practices remain poorly defined. We assessed a variety of RNA-seq expression data to determine factors affecting functional connectivity and topology in co-expression networks. Results: We examine RNA-seq co-expression data generated from 1970 RNA-seq samples using a Guilt-By-Association framework, in which genes are assessed for the tendency of co-expression to reflect shared function. Minimal experimental criteria to obtain performance on par with microarrays were 〉20 samples with read depth 〉10 M per sample. While the aggregate network constructed shows good performance (area under the receiver operator characteristic curve ~0.71), the dependency on number of experiments used is nearly identical to that present in microarrays, suggesting thousands of samples are required to obtain ‘gold-standard’ co-expression. We find a major topological difference between RNA-seq and microarray co-expression in the form of low overlaps between hub-like genes from each network due to changes in the correlation of expression noise within each technology. Contact: jgillis@cshl.edu or sballouz@cshl.edu Supplementary information: Networks are available at: http://gillislab.labsites.cshl.edu/supplements/rna-seq-networks/ and supplementary data are available at Bioinformatics online.

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink

	Location	Call Number	Expected	Availability

Others were also interested in ...

PAPER CURRENT

S·F·X

Fulltext

3

Unknown

Measuring the wisdom of the crowds in network-based gene function inference (2015)

Verleyen, W., Ballouz, S., Gillis, J.

Oxford University Press

In: Bioinformatics

add to mindlist on the mindlist

Details

Publication Date: 2015-02-27

Description: Motivation: Network-based gene function inference methods have proliferated in recent years, but measurable progress remains elusive. We wished to better explore performance trends by controlling data and algorithm implementation, with a particular focus on the performance of aggregate predictions. Results: Hypothesizing that popular methods would perform well without hand-tuning, we used well-characterized algorithms to produce verifiably ‘untweaked’ results. We find that most state-of-the-art machine learning methods obtain ‘gold standard’ performance as measured in critical assessments in defined tasks. Across a broad range of tests, we see close alignment in algorithm performances after controlling for the underlying data being used. We find that algorithm aggregation provides only modest benefits, with a 17% increase in area under the ROC (AUROC) above the mean AUROC. In contrast, data aggregation gains are enormous with an 88% improvement in mean AUROC. Altogether, we find substantial evidence to support the view that additional algorithm development has little to offer for gene function prediction. Availability and implementation: The supplementary information contains a description of the algorithms, the network data parsed from different biological data resources and a guide to the source code (available at: http://gillislab.cshl.edu/supplements/ ). Contact: jgillis@cshl.edu

Print ISSN: 1367-4803

Electronic ISSN: 1460-2059

Topics: Biology , Computer Science , Medicine

Published by Oxford University Press

Permalink