Presentation is loading. Please wait.

Presentation is loading. Please wait.

Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University.

Similar presentations


Presentation on theme: "Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University."— Presentation transcript:

1 Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University

2 glass (1 cm 2 ) ~ 6,500 genes Microarrays Different cDNA sequence

3 Example Group 1: Acute Myeloid Leukemia (AML), n 1 =11 Group 2: Acute Lymphoblastic Leukemia (ALL), n 2 =27 Data: OBS TYPE G1 G2 G3 … G7000 1 AML (Gene expression levels) 2 AML … … … … 11 AML 12 ALL … … 38 ALL

4 Testing for 7000 Gene Expression Levels Goal: Test H 0i : F ALL,i = F AML,i for i=1,…,7000. Here, “F” denotes cdf. Many choices for test statistics. Multiplicity problem: If tests are done at  =.05, and there are 6600 equivalent genes, then.05*6600= 330 will be determined “non-equivalent.”

5 Closed Testing to Control False Discoveries Let S = {1,2,…,7000} (gene labels). Let K = {i 1,…,i k }  S denote a particular subset. The Closed Testing Procedure: 1. Test H 0K : F ALL,K = F AML,K for each K  S, using a valid  -level test for each. 2. Reject H 0i : F ALL,i = F AML,i if H 0K is rejected for all K  {i}.

6 Theorem: CTP strongly Controls FWE Proof: Suppose H 0j 1,..., H 0j m all are true (unknown to you which ones). You may reject at least one only when you reject the intersection H 0j 1 ...  H 0j m. Thus, FWE = P(reject at least one of H 0j 1,..., H 0j m | H 0j 1,..., H 0j m all are true)  P(reject H 0j 1 ...  H 0j m | H 0j 1,..., H 0j m all are true) = .

7 Exact Tests for Composite Hypotheses H 0K Use the permutation distribution of min i  K p i, where p i = 2P(T 38-2 > |t i |), and t i = p-value = proportion of the 38!/(27!11!) permutations for which min i  K P i *  min i  K p i. Note: Exact despite “massively singular” covariance matrix!

8 A Slight Problem... There are 2 7000 -1 subsets K to be tested This might take a while...

9 A Fantastic Simplification You need only test 7000 of the 2 7000 -1 subsets! Why? Because P(min i  K P i *  c)  P(min i  K’ P i *  c) when K  K’. Significance for most lower order subsets is determined by significance of higher order subsets.

10 Illustration with Four Genes H {1234} min p =.0121, p {1234} =.0379 H {123} min p =.0121, p {123} <.0379 H {124} min p =.0121, p {124} <.0379 H {134} min p =.0121, p {134} <.0379 H {234} min p =.0142, p {234} =.0351 H {12} min p =.0121 p {12} <.0379 H {13} min p =.0121 p {13} <.0379 H {14} min p =.0121 p {14} <.0379 H {23} min p =.0142 p {23} <.0351 H {24} min p =.0142 p {24} <.0351 H {34} min p =.0191 p {34} =.0355 H 1 p 1 = 0.0121 p {1} <.0379 H 2 p 2 = 0.0142 p {2} <.0351 H 3 p 3 = 0.1986 p {3} =.1991 H 4 p 4 = 0.0191 p {4} <.0355 (Start at bottom.)

11 MULTTEST PROCEDURE Tests only the needed subsets (7000, not 2 7000 - 1). Samples from the permutation distribution. Only one sample is needed, not 7000 distinct samples: The joint distribution of minP is identical under H K and H S. (Called the “subset pivotality” condition by Westfall and Young, 1993.)

12 PROC MULTTEST code Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le.0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;

13 PROC MULTTEST Output (50 minutes for 200,000 samples)

14 Imbalance Issues Use of student t statistics does result in an exact, closed multiple testing procedure, but... There is imbalance: less power for gene types that are highly kurtotic than for normally distributed types. Solutions: Use exact unadjusted p-values – Already available for binary data – Computational difficulties otherwise Rank-transform the data prior to analysis

15 Rank Transform for Better Balance Proc rank; var gene1-gene7123; run; Proc multtest noprint out=adjp holm hoc stepperm n=200000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast ‘AML vs ALL’ -1 1; run; proc sort data=adjp(where=(raw_p le.0005)); by raw_p; proc print; var _var_ raw_p stppermp; run;

16 Rank Transformed Results

17 Comparing ALL and AML for Gene 6128 0 1000 2000 G E N E 6 1 2 8 ALLAML TYPE

18 Is Better Balance Good? Maybe not - Imbalance induces more powerful multiple testing procedure –Bonferroni multiplier implicitly reduced through imbalance –Serendipity!

19 Summary Westfall-Young Method is an exact, closed testing method, despite large p, small n Detected genes are “honestly significant” Robust (nonparametric)


Download ppt "Strong Control of the Familywise Type I Error Rate in DNA Microarray Analysis Using Exact Step-Down Permutation Tests Peter H. Westfall Texas Tech University."

Similar presentations


Ads by Google