Reply to

Mootha, Vamsi K; Daly, Mark J; Patterson, Nick; Hirschhorn, Joel N; Groop, Leif C; Altshuler, David

doi:10.1038/ng0704-663b

Download PDF

Correspondence
Published: 01 July 2004

Reply to "Statistical concerns about the GSEA procedure"

Vamsi K Mootha¹,
Mark J Daly¹,
Nick Patterson¹,
Joel N Hirschhorn¹,
Leif C Groop² &
…
David Altshuler¹

Nature Genetics volume 36, page 663 (2004)Cite this article

1324 Accesses
6 Citations
Metrics details

In reply

Our manuscript¹ described Gene Set Enrichment Analysis (GSEA) as “designed to detect subtle but coordinated differences in expression of a priori defined sets of functionally related genes.” The method requires two inputs: (i) a list of genes that have been ranked according to expression difference between two states and (ii) a priori defined gene sets (e.g., pathways), each consisting of members drawn from this list. A gene set then receives an enrichment score (ES) that is a measure of statistical evidence rejecting the null hypothesis that its members are randomly distributed in the ordered list. By definition, the ES is a function of the size of a gene set, the total number of genes in the entire list and the relative ranks of the members of the gene set.

Damian and Gorfine's first comment is that ES can be influenced by the size of a gene set. We completely agree, because in general, statistical significance is a function of two parameters: the estimated magnitude of an effect and the variance in this estimate. Because estimates based on larger numbers of measurements have lower variance than those based on fewer measures, the ES (a measure of statistical significance) may be greater for a set of 100 genes than for a second set of only 5 genes. This can be true if some or all of the 100 genes individually rank lower than the smaller set containing 5 genes. We note that scoring by statistical significance is common; for example, it is standard in genetic linkage analysis to rank regions based on the lod score, which is a measure of statistical significance (not effect size).

In their second example, Damian and Gorfine show that by removing almost half of the lowest-ranking genes in their hypothetical experiment, the ES for gene set S2 falls. The ES falls not simply because of the definition of membership in gene sets (as they claim), but rather because of the selective removal of all genes ranked lower than those in S2. As the members of S2 are now relegated to the bottom of the list, rather than being near the top, this gene set must receive a lower ES. Contrary to Damian and Gorfine's correspondence, the mere presence or absence of gene sets (without changing the underlying list of genes) will not affect the ES of a defined gene set.

Damian and Gorfine conclude by stating that GSEA is sensitive to “a priori definition of the hypotheses of interest.” We completely agree, as this is the desired behavior of “an analytic technique designed to test a priori defined gene sets”¹. Given that the explicit goal of GSEA is to combine information about functional relationships with measurements of gene expression, it would be quite surprising if the definition of the gene sets had no influence on the results.

References

Mootha, V.K. et al. Nat. Genet. 34, 267–273 (2003).
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Broad Institute, Harvard University and Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
Vamsi K Mootha, Mark J Daly, Nick Patterson, Joel N Hirschhorn & David Altshuler
Department of Endocrinology, University Hospital MAS, Lund University, Malmo, Sweden
Leif C Groop

Authors

Vamsi K Mootha
View author publications
You can also search for this author in PubMed Google Scholar
Mark J Daly
View author publications
You can also search for this author in PubMed Google Scholar
Nick Patterson
View author publications
You can also search for this author in PubMed Google Scholar
Joel N Hirschhorn
View author publications
You can also search for this author in PubMed Google Scholar
Leif C Groop
View author publications
You can also search for this author in PubMed Google Scholar
David Altshuler
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Vamsi K Mootha or David Altshuler.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mootha, V., Daly, M., Patterson, N. et al. Reply to "Statistical concerns about the GSEA procedure". Nat Genet 36, 663 (2004). https://doi.org/10.1038/ng0704-663b

Download citation

Issue Date: 01 July 2004
DOI: https://doi.org/10.1038/ng0704-663b

This article is cited by

Gene set enrichment analysis for non-monotone association and multiple experimental categories
- Rongheng Lin
- Shuangshuang Dai
- Leping Li
BMC Bioinformatics (2008)

Reply to "Statistical concerns about the GSEA procedure"

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

This article is cited by

Gene set enrichment analysis for non-monotone association and multiple experimental categories

Search

Quick links

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Gene set enrichment analysis for non-monotone association and multiple experimental categories

Search

Quick links