Presentation is loading. Please wait.

Presentation is loading. Please wait.

Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert.

Similar presentations


Presentation on theme: "Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert."— Presentation transcript:

1 Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert Tibshirani PNAS August 5, 2003 9440-9445

2 Challenge You test plants/patients/… in two settings (or from different populations). You want to know which genes are differentially expressed (alternate) You don’t want to make too many mistakes (declaring a gene to be alternate when in fact it’s null – not differentially expressed).

3 First Idea You take p-vals of the differences in expression. P-val(g) is the probability that if g is null, it would have a difference at least this large. You choose a cutoff, say 0.05. You say all genes that differ with p-val <= 0.05 are truly different. What’s the problem?

4 Thought Experiment Suppose that no genes are truly differentially expressed. You will conclude that about 5% of those you called significant really are. Your false discovery rate (number null among those predicted to be alternate/number predicted to be alternate) = 100%. Bad.

5 A Fundamental Insight All truly null genes (i.e. not truly differentially expressed) are equally likely to have any p-val. That is by construction of p-val: under the null hypothesis, 1% of the genes will be in the top 1 percentile, 1% will be in percentile between 89 and 90 th and so on. P-val is just a way of saying percentile in null condition.

6 What Do We Do With That? Mixture model: imagine null genes as light blue marbles and truly different genes as red ones. If the assay is decent, red marbles should be concentrated at the low p-values.

7 0 …. Pval …………………………………………………1

8 Method We Can Use We don’t of course know the colors of the marbles/we don’t know which genes are true alternates. However, we know that null marbles are equally likely to have any p-value. So, at the p-value where the height of the marbles levels off, we have primarily light blue marbles/null genes. Why?

9 0 …. Pval …………………………………………………1 Flat region starts here Level of flat region

10 Answer Because if all genes/marbles were null, the heights would be about uniform. Provided the reds are concentrated near the low p-vals, the flat regions will be primarily light blues.

11 Example: all null Consider the all null case. All marbles are light blue. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 100%

12 0 …. Pval …………………………………………………1 Flat region starts here Level of flat region

13 Example: all non-null Consider the all non-null case. All marbles are red and they are highly skewed. Flat region is essentially zero. False discovery rate in region to left of flat region is estimated number of white marbles (based on flat region)/number of marbles to left of flat region. This will be close to 0.

14 0 …. Pval …………………………………………………1 Flat region starts here

15 Example: mixed case Get a distribution of p-values. Find flat region. Estimate number of nulls in the left-of-flat region by extending the flat line. This gives the false discovery rate.

16 0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Possible p-value threshold

17 Example: mixed case What would you estimate the false discovery rate to be in the case that we declare the entire area to the left of the possible p-value threshold to be significant? 10%, 25%, 50%?

18 0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Possible p-value threshold

19 Obtaining q-values from False Discovery Rate Suppose we order genes from least p- value to greatest. That corresponds to one of these cartesian graphs. The q-value of a gene having p-value p is exactly the False Discovery Rate if the declared significance region had a threshold of p.

20 0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Q-value of a gene having this p-val is the FDR if this is the significance threshold.

21 Lessons for Research Mushy p-values (large error bars/few replicates) may force us to the far left in order to get a low False Discovery Rate. This may eliminate genes of interest. If testing out a gene is not too expensive, then we can accept a higher False Discovery Rate – nothing magical about 0.01.

22 0 …. Pval ……………………………………………1 Flat line; base level of nulls Number of genes having pval Better p-values avoid loss of genes, for small FalseDiscovery Rate.


Download ppt "Q-Vals (and False Discovery Rates) Made Easy Dennis Shasha Based on the paper "Statistical significance for genomewide studies" by John Storey and Robert."

Similar presentations


Ads by Google