Hypergeometric test
Example Let's assume you've sampled gene expression for 10,000 genes in two different conditions X and Y and found 300 genes differentially over-expressed. In the entire gene set, 2000 are known to be associated with a particular biological function B. You've noticed that there are quite a few of these F-associated genes in your list of differentially expressed genes, 60 to be precise. The hypergeometric test might help you to assess whether your observation is indeed statistically significant, i.e. whether function F is enriched in condition X beyond what might be expected by chance. From our little story above we can extract the following numbers to feed into the test: Population size: 10,000 (total number of genes) Number of successes in population: 2,000 (all F-associated genes) Sample size: 300 (over-expressed genes) Number of successes in sample 60 (F-associated genes in condition X) Which should result in a probability of p ~ 0.52 to draw 60 F-associ