Posts

Showing posts from June, 2017

Hypergeometric test

Example Let's assume you've sampled gene expression for 10,000 genes in two different conditions X and Y and found 300 genes differentially over-expressed. In the entire gene set, 2000 are known to be associated with a particular biological function B. You've noticed that there are quite a few of these F-associated genes in your list of differentially expressed genes, 60 to be precise. The hypergeometric test might help you to assess whether your observation is indeed statistically significant, i.e. whether function F is enriched in condition X beyond what might be expected by chance. From our little story above we can extract the following numbers to feed into the test: Population size: 10,000 (total number of genes) Number of successes in population: 2,000 (all F-associated genes) Sample size: 300 (over-expressed genes) Number of successes in sample 60 (F-associated genes in condition X) Which should result in a probability of  p ~ 0.52  to draw 60 F-associ...

MeDIP-seq or BS-seq (WGBS)

It depends on what you want to know. We've switched between MeDIP and BS-Seq a few times and we're now coming to the conclusion that they're both useful and complementary techniques. MeDIP is great because you can effectively get coverage of a whole genome from a single lane of a GAIIx. Any region you're interested in will be covered and it's not too expensive to run. The downsides are that MeDIP doesn't do well at telling you if there is an overall change in methylation level between samples (ie, the same basic pattern of methylation but with everything reduced). It also can't separate out the influences of methlyation in different contexts (CpG, CHG, CHH), and if these are being regulated independently then you can't extract out the separate changes. Bisulphite requires much more data than MeDIP to achieve the same level of coverage as MeDIP, and the large volumes of data are more difficult to handle, but once you have the data they're rea...