Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2.

Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2

Outline Outline Motivation & examples Univariate hypothesis testing Multiple hypothesis testing Results for the two examples Discussion

SCIENTIFIC: To determine which genes are differentially expressed between two sources of mRNA (trt, ctl). STATISTICAL: To assign appropriately adjusted p-values to thousands of genes, and/or make statements about false discovery rates. Motivation

8 treatment mice and 8 control mice 16 hybridizations: liver mRNA from each of the 16 mice (T i, C i ) is labelled with Cy5, while pooled liver mRNA from the control mice (C*) is labelled with Cy3. Probes: ~ 6,000 cDNAs (genes), including 200 related to lipid metabolism. Goal. To identify genes with altered expression in the livers of Apo AI knock-out mice (T) compared to inbred C57Bl/6 control mice (C). Apo AI experiment (Matt Callow, LBNL)

Golub et al (1999) experiments Goal. To identify genes which are differentially expressed in acute lymphoblastic leukemia (ALL) tumours in comparison with acute myeloid leukemia (AML) tumours. 38 tumour samples: 27 ALL, 11 AML. Data from Affymetrix chips, some pre-processing. Originally 6,817 genes; 3,051 after reduction. Data therefore a 3,051  38 array of expression values.

Data The gene expression data can be summarized as follows treatment control X = Here x i,j is the (relative) expression value of gene i in sample j. The first n 1 columns are from the treatment (T); the remaining n 2 = n - n 1 columns are from the control (C).

Univariate hypothesis testing Initially, focus on one gene only. We wish to test the null hypothesis H that the gene is not differentially expressed. In order to do so, we use a two sample t-statistic:

p-values The p-value or observed significance level p is the chance of getting a test statistic as or more extreme than the observed one, under the null hypothesis H of no differential expression.

Computing p-values by permutations We focus on one gene only. For the bth iteration, b = 1, , B; 1. Permute the n data points for the gene (x). The first n 1 are referred to as “treatments”, the second n 2 as “controls”. 2. For each gene, calculate the corresponding two sample t-statistic, t b. After all the B permutations are done; 3. Put p = #{b: |t b | ≥ |t observed |}/B (plower if we use >). With all permutations in the Apo AI data, B = n!/n 1 ! n 2 ! = 12,870; for the leukemia data, B = 1.2  10 9.

Many tests: a simulation study Simulations of this process for 6,000 genes with 8 treatments and 8 controls. All the gene expression values were simulated i.i.d from a N (0,1) distribution, i.e. NOTHING is differentially expressed.

genetp-value indexvalue(unadj.) 22714.93 2  10 -4 57094.82 3  10 -4 5622-4.62 4  10 -4 45214.34 7  10 -4 3156-4.31 7  10 -4 5898-4.29 7  10 -4 2164-3.98 1.4  10 -3 59303.91 1.6  10 -3 2427-3.90 1.6  10 -3 5694-3.88 1.7  10 -3 Unadjusted p-values Clearly we can’t just use standard p-value thresholds (.05,.01).

Multiple hypothesis testing: Counting errors Assume we are testing H 1, H 2, , H m. m 0 = # of true hypotheses R = # of rejected hypotheses # true# false null hypo. # non-signif.UTm - R # significantVSR m0m0 m-m 0 V = # Type I errors [false positives] T = # Type II errors [false negatives]

Type I error rates Per comparison error rate (PCER): the expected value of the number of Type I errors over the number of hypotheses, PCER = E(V)/m. Per-family error rate (PFER): the expected number of Type I errors, PFE = E(V). Family-wise error rate: the probability of at least one type I error FEWR = pr(V ≥ 1) False discovery rate (FDR) is the expected proportion of Type I errors among the rejected hypotheses FDR = E(V/R; R>0) = E(V/R | R>0)pr(R>0). Positive false discovery rate (pFDR): the rate that discoveries are false pFDR = E(V/R | R>0).

Two types of control of Type I error strong control: control of the Type I error whatever the true and false null hypotheses. For FWER, strong control means controlling maxpr(V ≥ 1 | M 0 ) M 0  H 0 C where M 0 = the set of true hypotheses (note |M 0 | = m 0 ); weak control: control of the Type I error only under the complete null hypothesis H 0 C =  i H i. For FWER, this is control of pr( V ≥ 1 | H 0 C ).

Adjustments to p-values For strong control of the FWER at some level , there are procedures which will take m unadjusted p-values and modify them separately, so-called single step procedures, the Bonferroni adjustment or correction being the simplest and most well known. Another is due to Sidák. Other, more powerful procedures, adjust sequentially, from the smallest to the largest, or vice versa. These are the step-up and step-down methods, and we’ll meet a number of these, usually variations on single-step procedures. In all cases, we’ll denote adjusted p-values by , usually with subscripts, and let the context define what type of adjustment has been made. Unadjusted p-values are denoted by p.

What should one look for in a multiple testing procedure? As we will see, there is a bewildering variety of multiple testing procedures. How can we choose which to use? There is no simple answer here, but each can be judged according to a number of criteria: Interpretation: does the procedure answer a relevant question for you? Type of control: strong or weak? Validity: are the assumptions under which the procedure applies clear and definitely or plausibly true, or are they unclear and most probably not true? Computability: are the procedure’s calculations straightforward to calculate accurately, or is there possibly numerical or simulation uncertainty, or discreteness?

p-value adjustments: single-step Define adjusted p-values π such that the FWER is controlled at level  where H i is rejected when π i ≤ . Bonferroni: π i = min (mp i, 1) Sidák: π i = 1 - (1 - p i ) m Bonferroni always gives strong control, proof next page. Sidák is less conservative than Bonferroni. When the genes are independent, it gives strong control exactly (FWER=  ), proof later. It controls FWER in many other cases, but is still conservative.

Proof for Bonferroni ( single-step adjustment) pr (reject at least one H i at level  | H 0 C ) = pr (at least one  i ≤  | H 0 C ) ≤  1 m pr (  i ≤  | H 0 C ), by Boole’s inequality =  1 m pr (P i ≤  /m | H 0 C ), by definiton of  i = m   /m, assuming P i ~ U[0,1]) = . Notes: 1. We are testing m genes, H 0 C is the complete null hypothesis, P i is the unadjusted p-value for gene i, while  i here is the Bonferroni adjusted p-value. 2. We use lower case letters for observed p-values, and upper case for the corresponding random variables.

Proof for Sidák’s method (single-step adjustment) pr( reject at least one H i | H 0 C ) = pr( at least one  i ≤  | H 0 C ) = 1 - pr( all  i >  | H 0 C ) = 1-∏ i=1 m pr(  i >  | H 0 C ) assuming independence Here  i is the Sidák adjusted p-value, and so  i >  if and only if P i > 1-(1-  ) 1/m (check),, giving 1-∏ i=1 m pr(  i >  | H 0 C ) = 1-∏ i=1 m pr(P i > 1-(1-  ) 1/m | H 0 C ) = 1- { (1-  ) 1/m } m since all P i ~ U[0,1], =  

Single-step adjustments (ctd) The minP method of Westfall and Young:  i = pr( min P l ≤ p i | H) 1≤l≤m Based on the joint distribution of the p-values {P l }. This is the most powerful of the three single-step adjustments. If P i  U [0,1], it gives a FWER exactly =  (see next page). It always confers weak control, and gives strong control under subset pivotality (definition next but one slide).

Proof for (single-step) minP adjustment Given level  let c  be such that pr(min 1 ≤ i ≤ m P i ≤ c  | H 0 C ) = . Note that {  i ≤  }  {P i ≤ c  } for any i. pr(reject at least one H i at level  | H 0 C ) = pr (at least one  i ≤  | H 0 C ) = pr (min 1 ≤ i ≤ m  i ≤  | H 0 C ) = pr (min 1 ≤ i ≤ m P i ≤ c  | H 0 C ) = 

Strong control and subset pivotality Note the above proofs are under H 0 C, which is what we term weak control. In order to get strong control, we need the condition of subset pivotality. The distribution of the unadjusted p-values (P 1, P 2, …P m ) is said to have the subset pivotality property if for all subsets L  {1,…,m} the distribution of the subvector {P i : i  L} is identical under the restrictions  {H i : i  L} and H 0 C. Using the property, we can prove that for each adjustment under their conditions, we have pr (reject at least one H i at level  i  M 0    = pr (reject at least one H i at level  i  M 0  0 C  ≤ pr (reject at least one H i at level  for all i  0 C  ≤   herefore, we have proved strong control for the previous three adjustments, assuming subset pivotality.

Permutation-based single-step minP adjustment of p-values For the bth iteration, b = 1, , B; 1.Permute the n columns of the data matrix X, obtaining a matrix X b. The first n 1 columns are referred to as “treatments”, the second n 2 columns as “controls”. 2.For each gene, calculate the corresponding unadjusted p-values, p i,b, i= 1,2,  m, (e.g. by further permutations) based on the permuted matrix X b. After all the B permutations are done. 3.Compute the adjusted p-values π i = #{b: min l p l,b ≤ p i }/B.

The computing challenge: iterated permutations The procedure is quite computationally intensive if B is very large (typically at least 10,000) and we estimate all unadjusted p-values by further permutations. Typical numbers: To compute one unadjusted p-value B = 10,000 # unadjusted p-values needed B = 10,000 # of genes m = 6,000. In general run time is O(mB 2 ).

Avoiding the computational difficulty of single-step minP adjustment maxT method: (Chapter 4 of Westfall and Young) π i = Pr( max |T l | ≥ | t i | | H 0 C ) 1≤l≤m needs B = 10,000 permutations only. However, if the distributions of the test statistics are not identical, it will give more weight to genes with heavy tailed distributions (which tend to have larger t-values) There is a fast algorithm which does the minP adjustment in O(mBlogB+mlogm) time.

Proof for the single-step maxT adjustment Given level , let c  such that pr(max 1 ≤ i ≤ m |T i | ≥ c  | H 0 C ) = . Note the { P i ≤  }  { |T i | ≥ c  } for any i. Then we have (cf. min P) pr(reject at least one H i at level  | H 0 C ) =pr (at least one P i ≤  | H 0 C ) =pr ( min 1 ≤ i ≤ m P i ≤  | H 0 C ) =pr (max 1 ≤ i ≤ m |T i | ≥ c  | H 0 C ) = . To simplify the notation we assumed a two sided test by using the statistic T i.We also assume P i ~ U[0,1].

More powerful methods: step-down adjustments The idea: S Holm’s modification of Bonferroni. Also applies to Sidák, maxT, and minP.

S Holm’s modification of Bonferroni Order the unadjusted p-values such that p r 1 ≤ p r 2 ≤  ≤ p r m.. The indices r 1, r 2, r 3,.. are fixed for given data. For control of the FWER at level, the step-down Holm adjusted p-values are π r j = max k  {1,…,j} {min((m-k+1)p rk, 1). The point here is that we don’t multiply every p rk by the same factor m, but only the smallest. The others are multiplied by successively smaller factors: m-1, m-2,..,. down to multiplying p rm by 1. By taking successive maxima of the first terms in the brackets, we can get monotonicity of these adjusted p-values. Exercise: Prove that Holm’s adjusted p-values deliver strong control. Exercise: Define step-down adjusted Sidák p-values analogously.

Step-down adjustment of minP Order the unadjusted p-values such that p r 1 ≤ p r 2 ≤  ≤ p r m. Step-down adjustment: it has a complicated formula, see below, but in effect is 1.Compare min{P r 1, , P r m } with p r1 ; 2.Compare min{P r 2, , P r m } with p r2 ; 3 Compare min{P r 3 , P r m } with p r i3 ……. m. Compare P r m with p r m. Enforce monotonicity on the adjusted p r i. The formula is π rj = max k  { 1,,…,j} {pr(min l  {rk,…rm} P l ≤ p rk | H 0 C )}.

Proof for step-down minP adjustment Let M 0 is the set of null hypotheses. Given level . Define g K to be the  quantile of min t  K P t | H 0 C, for any subset K, and define g i = g Si, where S i ={r i, …, r m } {There is at least one Type I error} ={Reject at least one H 0t : t  M 0 }  { min t  M0 p t  g j } (j  m- |M 0 | is defined by min t  M0 p t =p rj ) ={min t  M0 p t  g M0 } (note g j  g M0 ) Therefore, pr( at least one type I error | H t, t  M 0 )  pr(min t  M0 p t  g M0 | H t, t  M 0 ) = pr(min t  M0 p t  g M0 | H 0 C ) (using subset pivotality) =  (from the definition of g K ).

False discovery rate (Benjamini and Hochberg 1995) Definition: FDR = E(V/R |R>0) P(R >0). Rank the p-values p r1 ≤ p r2 ≤ …≤ p rm. The adjusted p-values are to control FDR when P i are independently distributed are given by the step-up formula:  ri = min k  {i…m} { min (mp rk /k,1) }. We use this as follows: reject p r1,p r2,…,,p rk* where k* is the largest k such that p rk ≤ k/m. . This keeps the FDR ≤  under independence, proof not given. Compare the above with Holm’s adjustment to control FWE, the step- down version of Bonferroni, which is  ri = max k  {1,…i} { min (kp rk,1) }.

False discovery rate (Benjamini and Yekutieli 2001) This variant controls FDR (1995 definition) for certain regular dependence structures, but it is mostly very conservative.  i = min k=i m { min (m/k p rk {  h=1 m 1/h},1) }.

Positive false discovery rate (Storey, 2001, independent case) A new definition of FDR, called positive false discovery rate (pFDR) pFDR= E(V/R | R >0) The logic behind this is that in practice, at lease one gene should be expected to be differentially expressed. The adjusted p-value (called q-value in Storey’s paper) are to control pFDR.  ri = min k  {1..,i} {m/k p rk  0 } Note  0 = m 0 /m can be estimated by the following formula for suitable   0 = #{i: p i >  }/ {(1-  ) m}. One choice for  is 1/2; another is the median of the observed (unadjusted) p-values.

Positive false discovery rate ( Storey, 2001, dependent case) In order to incorporate dependence, we need to assume identical distributions. Specify  0 to be a small number, say 0.2, where most t-statistics will fall between (-  0,  0 ) for a null hypothesis, and  to be a large number, say 3, where we reject the hypotheses whose t-statistics exceed . For the original data, find the W = #{i: |t i |   0 } and R= #{i: |t i |   }. We can do B permutations, for each one, we can compute W b and R b similarly by: W b = #{i: |t i |   0 } and R b = #{i: |t i |   }, b=1,…, B. The we can compute the proportion of genes expected to be null  0 =W/{(W 1 +W 2 +…+W b )/B) An estimate of the pFDR at the point  will be  0 {(R 1 +R 2 +…+R B )/B}/R. Further details can be found in the references.

Results

gene tunadj. pminPplowermaxT indexstatistic (  10 4 ) adjust. 2139-221.5.53 8  10 -5 2  10 -4 4117-131.5.53 8  10 -5 5  10 -4 5330-121.5.53 8  10 -5 5  10 -4 1731-111.5.53 8  10 -5 5  10 -4 538-111.5.53 8  10 -5 5  10 -4 1489-9.11.5.53 8  10 -5 1  10 -3 2526-8.31.5.53 8  10 -5 3  10 -3 4916-7.71.5.53 8  10 -5 8  10 -3 941-4.71.5.53 8  10 -5 0.65 2000+3.11.5.53 8  10 -5 1.00 5867-4.23.1.760.540.90 4608+4.86.2.930.870.61 948-4.77.8.960.930.66 5577-4.512.990.930.74

The gene names IndexName 2139Apo AI 4117EST, weakly sim. to STEROL DESATURASE 5330CATECHOL O-METHYLTRANSFERASE 1731Apo CIII 538EST, highly sim. to Apo AI 1489EST 2526Highly sim. to Apo CIII precursor 4916similar to yeast sterol desaturase

Callow data with some FDR values included

Some pFDR figures With the Apo AI data we find that  0 = 0.8 when  0 = 0.2. For values of  equal to 2, 3, 4, 5 and 10, we find estimate the pFDR with 100 permutations to be 0.572, 0.425, 0.281, 0.082 and 0.0085 respectively. These values are fairly robust to the number of permutations.

Discussion The minP adjustment seems more conservative than the maxT adjustment, but is essentially model-free. With the Callow data, we see that the adjusted minP values are very discrete; it seems that 12,870 permutations are not enough for 6,000 tests. With the Golub data, we see that the number of permutations matters. Discreteness is a real issue here to, but we do have enough permutations. The same ideas extend to other statistics: Wilcoxon, paired t, F, blocked F. Same speed-up works with the bootstrap.

Discussion, ctd. Major question in practice: Control of FWER or some form of FDR? In the first case, use minP, maxT or something else? In the second case, FDR, pFDR or something else. If minP is to be used, we need guidelines for its use in terms of sample sizes and number of genes. Another approach: Empirical Bayes. There are links with pFDR.

Selected references Westfall, PH and SS Young (1993) Resampling-based multiple testing: Examples and methods for p-value adjustment, John Wiley & Sons, Inc Benjamini, Y & Y Hochberg (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing JRSS B 57: 289-300 J Storey (2001): 3 papers (some with other authors), www-stat.stanford.edu/~jstorey/ The positive false discovery rate: a Bayesian interpretation and the q- value. A direct approach to false discovery rates Estimating false discovery rates under dependence, with applications to microarrays Y Ge et al (2001) Fast algorithm for resampling based p-value adjustment for multiple testing

Acknowledgements Yongchao Ge Yee Hwa (Jean) Yang Sandrine Dudoit Ben Bolstad

Software C and R code available for different tests: http://www.stat.berkeley.edu/users/terry

Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2.

Similar presentations

Presentation on theme: "Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2.

Similar presentations

Presentation on theme: "Multiple testing in large-scale gene expression experiments Statistics 246, Spring 2002 Week 8, Lecture 2."— Presentation transcript:

Similar presentations

About project

Feedback