Presentation is loading. Please wait.

Presentation is loading. Please wait.

Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics.

Similar presentations


Presentation on theme: "Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics."— Presentation transcript:

1 Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics

2 Outline Multiple Testing Multiple Testing Family wide error rates Family wide error rates False discovery rates False discovery rates Application to microarray data Application to microarray data Practical issues – correlated errors Practical issues – correlated errors Computing FDR by permutation procedures Computing FDR by permutation procedures Conditioning t-scores Conditioning t-scores

3 Reality Check Goals of Testing Goals of Testing To identify genes most likely to be changed or affected To identify genes most likely to be changed or affected To prioritize candidates for focused follow-up studies To prioritize candidates for focused follow-up studies To characterize functional changes consequent on changes in gene expression To characterize functional changes consequent on changes in gene expression So in practice we don’t need to be exact… So in practice we don’t need to be exact… but we do need to be principled! but we do need to be principled!

4 Multiple comparisons Suppose no genes really changed Suppose no genes really changed (as if random samples from same population) (as if random samples from same population) 10,000 genes on a chip 10,000 genes on a chip Each gene has a 5% chance of exceeding the threshold at a p-value of.05 Each gene has a 5% chance of exceeding the threshold at a p-value of.05 Type I error Type I error The test statistics for 500 genes should exceed.05 threshold ‘by chance’ The test statistics for 500 genes should exceed.05 threshold ‘by chance’

5 Distributions of p-values Random Data Real Microarray Data

6 When Might it not be Uniform? When actual distribution of test statistic departs from reference distribution When actual distribution of test statistic departs from reference distribution Outliers in data may give rise to more extremes Outliers in data may give rise to more extremes More small p-values More small p-values Approximate tests – often conservative Approximate tests – often conservative P-values are larger than occurrence probability P-values are larger than occurrence probability Distribution shifted right Distribution shifted right

7 Distribution of Numbers of p-values Each bin of width w contains a random number of p-values Each bin of width w contains a random number of p-values The expected number is Nw The expected number is Nw Each p-value has a probability w of lying in the bin Each p-value has a probability w of lying in the bin The distribution follows the Poisson law The distribution follows the Poisson law SD ~ (mean) 1/2 SD ~ (mean) 1/2

8 Characterizing False Positives Family-Wide Error Rate (FWE) Family-Wide Error Rate (FWE) probability of at least one false positive arising from the selection procedure probability of at least one false positive arising from the selection procedure Strong control of FWE: Strong control of FWE: Bound on FWE independent of number changed Bound on FWE independent of number changed False Discovery Rate: False Discovery Rate: Proportion of false positives arising from selection procedure Proportion of false positives arising from selection procedure ESTIMATE ONLY! ESTIMATE ONLY!

9 General Issues for Multiple Comparisons FWER vs FDR FWER vs FDR Are you willing to tolerate some false positives Are you willing to tolerate some false positives FDR: E(FDR) or P(FDR < Q)? FDR: E(FDR) or P(FDR < Q)? Actual (random) FDR has a long-tailed distribution Actual (random) FDR has a long-tailed distribution But E(FDR) methods are simpler and cleaner But E(FDR) methods are simpler and cleaner Correlations Correlations Many procedures surprise you when tests are correlated Many procedures surprise you when tests are correlated Always check assumptions of procedure! Always check assumptions of procedure! Models for Null distribution: a matter of art Models for Null distribution: a matter of art Strong vs weak control Strong vs weak control Will the procedure work for any combination of true and false null hypotheses? Will the procedure work for any combination of true and false null hypotheses?

10 FWER - Setting a Higher Threshold Suppose want to test N independent genes at overall level  Suppose want to test N independent genes at overall level  What level  * should each gene be tested at? What level  * should each gene be tested at? Want to ensure Want to ensure P( any false positive) <  P( any false positive) <  i.e. 1 –  P( all true negatives ) i.e. 1 –  P( all true negatives ) = P( any null accepted ) N = P( any null accepted ) N = ( 1 –  * ) N = ( 1 –  * ) N Solve for  * = 1 – (1 –  ) 1/N Solve for  * = 1 – (1 –  ) 1/N

11 Expectation Argument P ( any false positive ) P ( any false positive ) <= E ( # false positives ) <= E ( # false positives ) = N E ( any false positive) = N E ( any false positive) = N  * = N  * So we set  * =  / N So we set  * =  / N NB. No assumptions about joint distribution NB. No assumptions about joint distribution

12 ‘Corrected’ p-Values for FWE Sidak (exact correction for independent tests) Sidak (exact correction for independent tests) p i * = 1 – (1 – p i ) N if all p i are independent p i * = 1 – (1 – p i ) N if all p i are independent p i *  1 – (1 – Np i + …) gives Bonferroni p i *  1 – (1 – Np i + …) gives Bonferroni Bonferroni correction Bonferroni correction p i * = Np i, if Np i < 1, otherwise 1 p i * = Np i, if Np i < 1, otherwise 1 Expectation argument Expectation argument Still conservative if genes are co-regulated (correlated) Still conservative if genes are co-regulated (correlated) Both are too conservative for array use! Both are too conservative for array use!

13 Traditional Multiple Comparisons Methods Key idea: sequential testing Key idea: sequential testing Order p-values: p (1), p (2), … Order p-values: p (1), p (2), … If p (1) significant then test p (2), etc … If p (1) significant then test p (2), etc … Mostly improvements on this simple idea Mostly improvements on this simple idea Complicated proofs Complicated proofs

14 Holm’s FWER Procedure Order p-values: p (1), …, p (N) Order p-values: p (1), …, p (N) If p (1) <  /N, reject H (1), then… If p (1) <  /N, reject H (1), then… If p (2) <  /(N-1), reject H (2), then… If p (2) <  /(N-1), reject H (2), then… Let k be the largest n such that p (n) <  /n, for all n <= k Let k be the largest n such that p (n) <  /n, for all n <= k Reject p (1) … p (k) Reject p (1) … p (k) Then P( at least one false positive) <  Then P( at least one false positive) <  Proof doesn’t depend on distributions Proof doesn’t depend on distributions

15 Hochberg’s FWER Procedure Find largest k: p (k) <  / (N – k + 1 ) Find largest k: p (k) <  / (N – k + 1 ) Then select genes (1) to (k) Then select genes (1) to (k) More powerful than Holm’s procedure More powerful than Holm’s procedure But … requires assumptions: independence or ‘positive dependence’ But … requires assumptions: independence or ‘positive dependence’ When one type I error, could have many When one type I error, could have many

16 Holm & Hochberg Adjusted P Order p-values p r 1, p r 2, …, p r M Order p-values p r 1, p r 2, …, p r M Holm (1979) step-down adjusted p-values Holm (1979) step-down adjusted p-values p (j) * = max k = 1 to j {min ((M-k+1)p (k), 1)} Adjust out-of-order p-values in relation to those lower (‘step-down’) Hochberg (1988) step-up adjusted p-values Hochberg (1988) step-up adjusted p-values p (j) * = min k = j to M {min ((M-k+1)p (k), 1) } Adjust out-of-order p-values in relation to those higher (‘step-up’)

17 Simes’ Lemma Suppose we order the p-values from N independent tests using random data: Suppose we order the p-values from N independent tests using random data: p (1), p (2), …, p (N) p (1), p (2), …, p (N) Pick a target threshold  Pick a target threshold  P( p (1) <  /N || p (2) < 2  /N || p (3) < 3  /N || … ) =  P( p (1) <  /N || p (2) < 2  /N || p (3) < 3  /N || … ) =   P = P( min(p 1,p 2 ) <  /2) + P(min(p 1,p 2 ) >  /2 & max(p 1,p 2 ) <  ) Area = (  –    +    p2p2 p1p1

18 Simes’ Test Pick a target threshold  Pick a target threshold  Order the p-values : p (1), p (2), …, p (N) Order the p-values : p (1), p (2), …, p (N) If for any k, p (k) < k  /N If for any k, p (k) < k  /N Select the corresponding genes (1) to (k) Select the corresponding genes (1) to (k) Test valid against complete Null hypothesis, if tests are independent or ‘positively dependent’ Test valid against complete Null hypothesis, if tests are independent or ‘positively dependent’ Doesn’t give strong control Doesn’t give strong control Somewhat non-conservative if negative correlations among tests Somewhat non-conservative if negative correlations among tests

19 Correlated Tests and FWER Typically tests are correlated Typically tests are correlated Extreme case: all tests highly correlated Extreme case: all tests highly correlated One test is proxy for all One test is proxy for all ‘Corrected’ p-values are the same as ‘uncorrected’ ‘Corrected’ p-values are the same as ‘uncorrected’ Intermediate case: some correlation Intermediate case: some correlation Usually probability of obtaining a p-value by chance is in between Sidak and uncorrected values Usually probability of obtaining a p-value by chance is in between Sidak and uncorrected values

20 Symptoms of Correlated Tests P-value Histograms

21 Distributions of numbers of p-values below threshold 10,000 genes; 10,000 genes; 10,000 random drawings 10,000 random drawings L: Uncorrelated R: Highly correlated L: Uncorrelated R: Highly correlated

22 Permutation Tests We don’t know the true distribution of gene expression measures within groups We don’t know the true distribution of gene expression measures within groups We simulate the distribution of samples drawn from the same group by pooling the two groups, and selecting randomly two groups of the same size we are testing. We simulate the distribution of samples drawn from the same group by pooling the two groups, and selecting randomly two groups of the same size we are testing. Need at least 5 in each group to do this! Need at least 5 in each group to do this!

23 Permutation Tests – How To Suppose samples 1,2,…,10 are in group 1 and samples 11 – 20 are from group 2 Suppose samples 1,2,…,10 are in group 1 and samples 11 – 20 are from group 2 Permute 1,2,…,20: say Permute 1,2,…,20: say 13,4,7,20,9,11,17,3,8,19,2,5,16,14,6,18,12,15,10 13,4,7,20,9,11,17,3,8,19,2,5,16,14,6,18,12,15,10 Construct t-scores for each gene based on these groups Construct t-scores for each gene based on these groups Repeat many times to obtain Null distribution of t-scores Repeat many times to obtain Null distribution of t-scores This will be a t-distribution  original distribution has no outliers This will be a t-distribution  original distribution has no outliers

24 Critiques of Permutations Variances of permuted values for really separate groups are inflated Variances of permuted values for really separate groups are inflated Permuted t -scores for many genes may be lower than from random samples from the same population Permuted t -scores for many genes may be lower than from random samples from the same population Therefore somewhat too conservative p- values for some genes Therefore somewhat too conservative p- values for some genes

25 Multivariate Permutation Tests Want a null distribution with same correlation structure as given data but no real differences between groups Want a null distribution with same correlation structure as given data but no real differences between groups Permute group labels among samples Permute group labels among samples redo tests with pseudo-groups redo tests with pseudo-groups repeat ad infinitum (10,000 times) repeat ad infinitum (10,000 times)

26 Westfall-Young Approach Procedure analogous to Holm, except that at each stage, they compare the smallest p-value to the smallest p-value from an empirical null distribution of the hypotheses being tested. Procedure analogous to Holm, except that at each stage, they compare the smallest p-value to the smallest p-value from an empirical null distribution of the hypotheses being tested. How often is smallest p-value less than a given threshold if tests are correlated to the same extent and all Nulls are true? How often is smallest p-value less than a given threshold if tests are correlated to the same extent and all Nulls are true? Construct permuted samples: n = 1,…,N Construct permuted samples: n = 1,…,N Determine p-values p j [ n ] for each sample n Determine p-values p j [ n ] for each sample n

27 Westfall-Young Approach – 2 Construct permuted samples: n = 1,…,N Construct permuted samples: n = 1,…,N Determine p-values p j [ n ] for each sample n Determine p-values p j [ n ] for each sample n To correct the i-th smallest p-value, drop those hypotheses already rejected (at a smaller level) To correct the i-th smallest p-value, drop those hypotheses already rejected (at a smaller level) The i-th smallest p-value cannot be smaller than any previous p-values The i-th smallest p-value cannot be smaller than any previous p-values

28 Critiques of MV Permutation as Null Correlation structure of 2 nd order statistics is not equivalent Correlation structure of 2 nd order statistics is not equivalent E.g. we sometimes want to find significant correlations among genes E.g. we sometimes want to find significant correlations among genes The permutation distribution of correlations is NOT an adequate Null distribution – why? The permutation distribution of correlations is NOT an adequate Null distribution – why? Use a bootstrap algorithm on centered variables Use a bootstrap algorithm on centered variables see papers by Dudoit and van der Laan see papers by Dudoit and van der Laan

29 False Discovery Rate In genomic problems a few false positives are often acceptable. In genomic problems a few false positives are often acceptable. Want to trade-off power.vs. false positives Want to trade-off power.vs. false positives Could control: Could control: Expected number of false positives Expected number of false positives Expected proportion of false positives Expected proportion of false positives What to do with E(V/R) when R is 0? What to do with E(V/R) when R is 0? Actual proportion of false positives Actual proportion of false positives

30 Truth vs. Decision Truth vs. Decision # not rejected # rejected totals # true null H U V (F +) m0m0m0m0 # non-true H T (F -) S m1m1m1m1 totals m - R Rm Truth Decision

31 Catalog of Type I Error Rates Per-family Error Rate Per-family Error Rate PFER = E(V) Per-comparison Error Rate Per-comparison Error Rate PCER = E(V)/m PCER = E(V)/m Family-wise Error Rate Family-wise Error Rate FWER = p(V ≥ 1) False Discovery Rate False Discovery Rate i) FDR = E(Q), where Q = V/R if R > 0; Q = 0 if R = 0 (Benjamini-Hochberg) ii) FDR = E( V/R | R > 0) (Storey)

32 Benjamini-Hochberg Can’t know what FDR is for a particular sample Can’t know what FDR is for a particular sample B-H suggest procedure controlling average FDR B-H suggest procedure controlling average FDR Order the p-values : p (1), p (2), …, p (N) Order the p-values : p (1), p (2), …, p (N) If any p (k) < k  /N If any p (k) < k  /N Then select genes (1) to (k) Then select genes (1) to (k) q-value: smallest FDR at which the gene becomes ‘significant’ q-value: smallest FDR at which the gene becomes ‘significant’ NB: acceptable FDR may be much larger than acceptable p-value (e.g. 0.10 ) NB: acceptable FDR may be much larger than acceptable p-value (e.g. 0.10 )

33 Argument for B-H Method If no true changes (all null H’s hold) If no true changes (all null H’s hold) Q = 1 condition of Simes’ lemma holds Q = 1 condition of Simes’ lemma holds Therefore probability <  Therefore probability <  Otherwise Q = 0 Otherwise Q = 0 If all true changes (no null H’s hold) If all true changes (no null H’s hold) Q = 0 <  Q = 0 <  Build argument by induction from both ends and up from N = 2 Build argument by induction from both ends and up from N = 2

34 Practical Issues Actual proportion of false positives varies from data set to data set Actual proportion of false positives varies from data set to data set Mean FDR could be low but could be high in your data set Mean FDR could be low but could be high in your data set

35 Distributions of numbers of p-values below threshold 10,000 genes; 10,000 genes; 10,000 random drawings 10,000 random drawings L: Uncorrelated R: Highly correlated L: Uncorrelated R: Highly correlated

36 Controlling the Number of FP’s B-H procedure only guarantees long-term average value of E(V/R|R>0)P(R>0) B-H procedure only guarantees long-term average value of E(V/R|R>0)P(R>0) can be quite badly wrong in individual cases can be quite badly wrong in individual cases Korn’s method gives confidence bound on individual case Korn’s method gives confidence bound on individual case also addresses issue of correlations also addresses issue of correlations Builds on Westfall-Young approach to control tail probability of proportion of false positives (TPPFP) Builds on Westfall-Young approach to control tail probability of proportion of false positives (TPPFP)

37 Korn’s Procedure To guarantee no more than k false positives To guarantee no more than k false positives Construct null distribution as in Westfall- Young Construct null distribution as in Westfall- Young Order p-values: p (1), …,p (M) Order p-values: p (1), …,p (M) Reject H (1), …,H (k) Reject H (1), …,H (k) For next p-values For next p-values Compare p-value to full null Compare p-value to full null N.B. This gives strong control N.B. This gives strong control Continue until one H not rejected Continue until one H not rejected

38 Issues with Korn’s Procedure Valid if select k first then follow through procedure, not if try a number of different k and pick the one with most genes – as people actually proceed Valid if select k first then follow through procedure, not if try a number of different k and pick the one with most genes – as people actually proceed Only approximate FDR Only approximate FDR Computationally intensive Computationally intensive Available in BRB Available in BRB

39 Storey’s pFDR Storey argues that E(Q | V > 0 ) is what most people think FDR means Storey argues that E(Q | V > 0 ) is what most people think FDR means Sometimes quite different from B-H FDR Sometimes quite different from B-H FDR Especially if number of rejected nulls needs to be quite small in order to get acceptable FDR Especially if number of rejected nulls needs to be quite small in order to get acceptable FDR E.G. if P(V=0) = 1/2, then pFDR = 2*FDR E.G. if P(V=0) = 1/2, then pFDR = 2*FDR

40 A Bayesian Interpretation Suppose nature generates true nulls with probability  0 and false nulls with P =  1 Suppose nature generates true nulls with probability  0 and false nulls with P =  1 Then pFDR = P( H true | test statistic) Then pFDR = P( H true | test statistic) Question: We rarely have an accurate prior idea about  0 Question: We rarely have an accurate prior idea about  0 Storey suggests estimating it Storey suggests estimating it

41 Storey’s Procedure  Estimate proportion of true Nulls (  0 ) Count number of p-values greater than ½ Count number of p-values greater than ½  Fix rejection region (or try several)  Estimate probability of p-value for true Null in rejection region  Form ratio: 2*{# p > ½} p 0 / {# p ½} p 0 / {# p < p 0 }  Adjust for small numbers (# p < p 0 )  Bootstrap ratio to obtain confidence interval for pFDR

42 Practical Issues Storey’s procedure may give reasonable estimates for  0 ~ O(1), but can’t distinguish values of  1 that are very small Storey’s procedure may give reasonable estimates for  0 ~ O(1), but can’t distinguish values of  1 that are very small How much does the significance test depend on the choice of  0 ? How much does the significance test depend on the choice of  0 ? Such differences may have a big impact on posterior probabilities Such differences may have a big impact on posterior probabilities

43 Moderated Tests Many false positives with t-test arise because of under-estimate of variance Many false positives with t-test arise because of under-estimate of variance Most gene variances are comparable Most gene variances are comparable (but not equal) (but not equal) Can we use ‘pooled’ information about all genes to help test each? Can we use ‘pooled’ information about all genes to help test each?

44 Stein’s Lemma Whenever you have multiple variables with comparable distributions, you can make a more efficient joint estimator by ‘shrinking’ the individual estimates toward the common mean Whenever you have multiple variables with comparable distributions, you can make a more efficient joint estimator by ‘shrinking’ the individual estimates toward the common mean Can formalize this using Bayesian analysis Can formalize this using Bayesian analysis Suppose true values come from prior distrib. Suppose true values come from prior distrib. Mean of all parameter estimates is a good estimate of prior mean Mean of all parameter estimates is a good estimate of prior mean

45 SAM Statistical Analysis of Microarrays Statistical Analysis of Microarrays Uses a ‘fudge factor’ to shrink individual SD estimates toward a common value Uses a ‘fudge factor’ to shrink individual SD estimates toward a common value d i = (x 1,i – x 2,i / ( s i + s 0 ) d i = (x 1,i – x 2,i / ( s i + s 0 ) Patented! Patented!

46 limma Empirical Bayes formalism Empirical Bayes formalism Depends on prior estimate of number of genes changed Depends on prior estimate of number of genes changed Bioconductor’s approach – free! Bioconductor’s approach – free!

47 limma Distribution Models Sample statistics: Sample statistics: Priors Priors Coefficients: Coefficients: Variances: Variances:

48 Moderated T Statistic Moderated variance estimate: Moderated variance estimate: Moderated t Moderated t Moderated t has t distribution on d 0 +d g df. Moderated t has t distribution on d 0 +d g df.


Download ppt "Significance Testing of Microarray Data BIOS 691 Fall 2008 Mark Reimers Dept. Biostatistics."

Similar presentations


Ads by Google