In reply

Our manuscript1 described Gene Set Enrichment Analysis (GSEA) as “designed to detect subtle but coordinated differences in expression of a priori defined sets of functionally related genes.” The method requires two inputs: (i) a list of genes that have been ranked according to expression difference between two states and (ii) a priori defined gene sets (e.g., pathways), each consisting of members drawn from this list. A gene set then receives an enrichment score (ES) that is a measure of statistical evidence rejecting the null hypothesis that its members are randomly distributed in the ordered list. By definition, the ES is a function of the size of a gene set, the total number of genes in the entire list and the relative ranks of the members of the gene set.

Damian and Gorfine's first comment is that ES can be influenced by the size of a gene set. We completely agree, because in general, statistical significance is a function of two parameters: the estimated magnitude of an effect and the variance in this estimate. Because estimates based on larger numbers of measurements have lower variance than those based on fewer measures, the ES (a measure of statistical significance) may be greater for a set of 100 genes than for a second set of only 5 genes. This can be true if some or all of the 100 genes individually rank lower than the smaller set containing 5 genes. We note that scoring by statistical significance is common; for example, it is standard in genetic linkage analysis to rank regions based on the lod score, which is a measure of statistical significance (not effect size).

In their second example, Damian and Gorfine show that by removing almost half of the lowest-ranking genes in their hypothetical experiment, the ES for gene set S2 falls. The ES falls not simply because of the definition of membership in gene sets (as they claim), but rather because of the selective removal of all genes ranked lower than those in S2. As the members of S2 are now relegated to the bottom of the list, rather than being near the top, this gene set must receive a lower ES. Contrary to Damian and Gorfine's correspondence, the mere presence or absence of gene sets (without changing the underlying list of genes) will not affect the ES of a defined gene set.

Damian and Gorfine conclude by stating that GSEA is sensitive to “a priori definition of the hypotheses of interest.” We completely agree, as this is the desired behavior of “an analytic technique designed to test a priori defined gene sets”1. Given that the explicit goal of GSEA is to combine information about functional relationships with measurements of gene expression, it would be quite surprising if the definition of the gene sets had no influence on the results.