conceptual question about FDR, FDR adjusted p-value and q-value

October 04, 2016

Dear Jack,

The thing to understand is that terms like FDR and q-value were defined in 
specific ways by their original inventors but are used in more generic 
ways by later researchers who adapt, modify or use the ideas.

The term "false discovery rate (FDR)" was created by Benjamini and 
Hochberg in their 1995 paper.  They gave a particular definition of what 
they meant by FDR.  Their procedure accepted or rejected hypotheses, but 
did not produce adjusted p-values.

Benjamini and Yekutieli presented another more conservative algorithm to 
control the FDR in a 2001 paper.  Same definition of FDR, but a different 
algorithm.

In 2002, I re-interpreted the Benjamini and Hochberg (BH) and Benjamini 
and Yekutieli (BY) procedures in terms of adjusted p-values.  I 
implemented the resulting algorithms in the function p.adjust() in the 
stats package, and used them in the limma package, and this lead to the 
concept of an FDR adjusted p-value.  The terminology used by the 
p.adjust() function and limma packages has lead people to refer to "BH 
adjusted p-values".

The adjusted p-value definition that you give is essentially the same as 
the BH adjusted p-value, except that you omitted the last step in the 
procedure.  Your definition as it stands is not an increasing function of 
the original p-values.

In 2002, John Storey created a new definition of "false discovery rate". 
Storey's definition is based on Benjamini and Hochberg's original idea, 
but is mathematically a bit more flexible.  John Storey also created the 
terminology "q-value" for a quantity estimates his definition of FDR.  He 
implemented q-value estimation procedures in an R package called qvalue.

So, strictly speaking, the q-value and the FDR adjusted p-value are 
similar but not quite the same.  However the terms q-value and FDR 
adjusted p-value are often used generically by the Bioconductor community 
to refer to any quantity that controls or estimates any definition of the 
FDR.  In this general sense the terms are synonyms.

The lesson to draw from this is that different methods and different 
packages are trying to do slighty different things and give slightly 
different results, and you should always cite the specific software and 
method that you have used.

Best wishes
Gordon

========================================================

p-value = extremal probability for a test statistic under the null
hypothesis, not accounting for multiple comparisons
BH p-value, pBH = extremal probability for the same, after accounting
for
multiple comparisons to upper-bound the overall false positive rate at
<= p
q-value = direct estimate of the FDR associated with pBH

see http://genomics.princeton.edu/storeylab/papers/directfdr.pdf for
the
original, and quite well written paper, where on page 485,

The basic point that we make is that using the Benjamini and Hochberg
(1995) method to
control FDR at level á=ð0 is equivalent to (i.e. rejects the same
p-values
as) using the proposed
method to control FDR at level á. The gain in power from our approach
is
clear--we control
a smaller error rate (á á=ð0), yet reject the same number of tests.


q-values depend also on the estimated fraction of test p-values in the
chance or uniform component of the distribution at some pFDR p.
pi0 = estimated probability (overall) of a given result being truly
null
(i.e., false positive) at p | FDR
q - value = BH p-value * pi0 (probability that test t incorrectly
rejects
the null at pBH)

So q = pBH * pi0 (++) as can be verified from the output, and directly
estimates the pFDR for test t assuming independence among the tests.
The mathematical justification for this is given in the paper; the
basic
machinery can be, and has been, extended to many other situations.

(++) If pi0 is estimated as at or very near 1.0, then pBH and q will
be the
same for any given test t, to the limit of machine precision (see
paper).

At least that's how it appears to be implemented last time I looked at
the
code and the paper :-)

References:

https://stat.ethz.ch/pipermail/bioconductor/2012-December/049902.html

https://www.biostars.org/p/128931/

https://support.bioconductor.org/p/49864/

Search This Blog

Omics Academy

conceptual question about FDR, FDR adjusted p-value and q-value

Comments

Post a Comment

Popular posts from this blog

gspread error:gspread.exceptions.SpreadsheetNotFound

Miniconda installation problem: concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

P and q values in RNA Seq