Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Course in Multiple Comparisons and Multiple Tests

Similar presentations


Presentation on theme: "A Course in Multiple Comparisons and Multiple Tests"— Presentation transcript:

1 A Course in Multiple Comparisons and Multiple Tests
Peter H. Westfall, Ph.D. Professor of Statistics, Department of Inf. Systems and Quant. Sci. Texas Tech University

2 Learning Outcomes Elucidate reasons that multiple comparisons procedures (MCPs) are used, as well as their controversial nature Know when and how to use classical interval-based MCPs including Tukey, Dunnett, and Bonferroni Understand how MCPs affect power Elucidate the definition of closed testing procedures (CTPs) Understand specific types of CTPs, benefits and drawbacks Distinguish false discovery rate (FDR) from familywise error rate (FWE) Understand general issues regarding Bayesian MCPs

3 Outline of Material Introduction. Overview of Problems, Issues, and Solutions, Regulatory and Ethical Perspectives, Families of Tests, Familywise Error Rate, Bonferroni. (pp. 5-21) Interval-Based Multiple Inferences in the standard linear models framework. One-way ANOVA and ANCOVA, Tukey, Dunnett, and Monte Carlo Methods, Adjusted p-values, general contrasts, Multivariate T distribution, Tight Confidence Bands, TreatmentxCovariate Interaction, Subgroup Analysis (pp ) Power and Sample Size Determinations for multiple comparisons. (pp ) Stepwise and Closed Testing Procedures I: P-value-Based Methods. Closure Method, Global Tests; Holm, Hommel, Hochberg and Fisher combined methods for p-Values; (pp ) Stepwise and Closed Testing Procedures II: Fixed Sequences, Gatekeepers and I-U tests: Fixed Sequence tests, Gatekeeper procedures, Multiple hypotheses in a gate, Intersection-union tests; with application to dose response, primary and secondary endpoints, bioequivalence and combination therapies (pp )

4 Outline (Continued) Stepwise and Closed Testing Procedures III: Methods that use logical constraints and correlations. Lehmacher et al. Method for Multiple endpoints; Range-Based and F-based ANOVA Tests, Fisher’s protected LSD, Free and Restricted Combinations, Shaffer-Type Methods for dose comparisons and subgroup analysis (pp ) Multiple nonparametric and semiparametric tests: Bootstrap and Permutation-based Closed tesing. PROC MULTTEST, examples with multiple endpoints, genetic associations, gene expression, binary data and adverse events (pp ) More complex models and FWE control: Heteroscedasticity, Repeated measures, and large sample methods. Applications: multiple treatment comparisons, crossover designs, logistic regression of cure rates (pp ) False Discovery Rate: Benjamini and Hochberg’s method, comparison with FWE – controlling methods ( ) Bayesian methods: Simultaneous credible intervals, ranking probabilities and loss functions, PROC MIXED posterior sampling, Bayesian testing of multiple endpoints (pp ) Conclusion, discussion, references ( )

5 Sources of Multiplicity
Multiple variables (endpoints) Multiple timepoints Subgroup analysis Multiple comparisons Multiple tests of the same hypothesis Variable and Model selection Interim analysis Hidden Multiplicity: File Drawers, Outliers

6 The Problem: “Significant” results may fail to replicate.
Documented cases: Ioannidis (JAMA 2005) Contradicted and Initially Stronger Effects in Highly Cited Clinical Research John P. A. Ioannidis, MD JAMA. 2005;294:

7 An Example Phase III clinical trial Three arms – Placebo, AC, Drug
Endpoints: Signs and symptoms Measured at weekly visits Baseline covariates Phase III – issue is not so severe – especiailly with FDA oversight – But the problem still exists here. The problem is much greater in medical studies where such oversight does not exist.

8 Example-Continued ‘Features’ displayed at trial conclusion: Trends
Baseline adjusted comparisons of raw data Baseline adjusted % changes Nonparametric and parametric tests Specific endpoints and combinations of endpoints Particular week results AC and Placebo comparisons Fact: The features that “look the best” are biased.

9 Example Continued – Feature Selection
‘Effect Size’ is a feature Effect size = (mean difference)/sd Dimensionless .2=‘small’, .5=‘medium’, .8=‘large’ Estimated effect sizes : F1, F2,…,Fk What if you select (max{F1,F2,…,Fk}) and publish it?

10 The Scientific Concern

11 Feature Selection Model
Clinical Trials Simulation Real data used Conservative! If you must know more: Fj = mj + ej, j=1,…,20. Error terms or N(0,.22) True effect sizes mj are N(.3,.12) Features Fj are highly correlated.

12 Key Points: (i) Multiplicity invites Selection (ii) Selection has an EFFECT
Just like effects due to Treatment Confounding Learning Nonresponse Placebo This is not an easy message to get across to the non-statistically trained Also, practitioners don’t want to hear it: My first consulting experience – alcoholism and genetics One of the reasons that multiplicity is unpopular – it is more defensive – you can negate things very easily with multiplicity considerations – But some bad science needs negating – the “bible codes” is one example. Consideration of Multiplicity is simply GOOD SCIENCE. Even if you don’t use it explicitly, knowing about it will help you design studies carefully, define variables in advance, and interpret the results from exploratory studies more cautiously.

13 Published Guidelines ICH Guidelines CPMP Points to consider
CDRH Statistical Guidance ASA Ethical Guidelines

14 Regulatory/Journal/Ethical/Professional Concerns
Replicability (good science) Fairness Regulatory report: The drug company reported efficacy at p= We repeated the analysis in several different ways that the company might have done. In 20 re-analyses of the data, 18 produced p-values greater than Only one of the 20 re-analyses produced a p-value smaller than .047.

15 Multiple Inferences: Notation
There is a “family” of k inferences Parameters are q1,…, qk Null hypotheses are H01: q1=0, …, H0k: qk=0

16 Comparisonwise Error Rate (CER)
Intervals: CERj = P(Intervalj incorrect) Tests: CERj = P(Reject H0j | H0j is true) Usually CER = a =.05

17 Familywise Error Rate (FWE)
Intervals: FWE = 1 - P(all intervals are correct) Tests: FWE = P(reject at least one true null)

18 False Discovery Rate FDR = E(proportion of rejections that are incorrect) Let R = total # of rejections Let V = # of erroneous rejections FDR = E(V/R) (0/0 defined as 0). FWE = P(V>0)

19 Bonferroni Method Identify Family of inferences
Identify number of elements (k) in the Family Use a/k for all inferences. Ex: With k=36, p-values must be less than 0.05/36 = to be “significant”

20 FWE Control for Bonferroni
P(p0j1£.05/36 or … or p0jm £ .05/36 | H0j1,..., H0jmtrue) P(p0j1£.05/36) + … + P( p0jm £ .05/36) = (.05)m/36 £ .05 B A P(AÈB) £ P(A) + P(B)

21 “Families” in clinical trials1
Efficacy Safety Main Interest - Primary & Secondary Approval and Labeling depend on these. Tight FWE control needed. Serious and known treatment- related AEs FWE control not needed Lesser Interest - Depending on goals and reviewers, FWE controlling methods might be needed. All other AEs Reasonable to control FWE (or FDR) Supportive Tests - mostly descriptive FWE control not needed. Exploratory Tests - investigate new indications - future trials needed to confirm - do what makes sense. 1Westfall, P. and Bretz, F. (2003). Multiplicity in Clinical Trials. Encyclopedia of Biopharmaceutical Statistics, second edition, Shein-Chung Chow, ed., Marcel Decker Inc., New York, pp

22 Classical Single-Step Testing and Interval Methods to Control FWE
Simultaneous confidence intervals; Adjusted p-values Dunnett method Tukey’s method Simulation-based methods for general comparisons

23 “Specificity” and “Sensitivity”
…then use If you want ... Estimates of effect sizes & error margins Confident inequalities Overall Test Simultaneous Confidence Intervals Stepwise or closed tests F-test, O’Brien, etc.

24 The Model Y = Xb + e where e ~ N(0, s2 I )
Includes ANOVA, ANCOVA, regression For group comparisons, covariate adjustment Not valid for survival analysis, binary data, multivariate data

25 Example: Pairwise Comparisons
against Control Goal: Estimate all mean differences from control and provide simultaneous 95% error margins: /* Creating a data set with given means and SDs*/ data z; do g=0 to 6; do n= 1 to 4; z = rannor(0); output; end; end; run; proc standard data=z out=z mean=0 std=1; var z; by g; data toxfake; set z; if g=0 then gain = *z; if g=1 then gain = *z; if g=2 then gain = *z; if g=3 then gain = *z; if g=4 then gain = *z; if g=5 then gain = *z; if g=6 then gain = *z; proc means data=toxfake; var z gain; proc glm data=toxfake; class g; model gain=g; means g/dunnett; quit; What ca to use?

26 Comparison of Critical Values
data; c_alpha = probmc("DUNNETT2",.,.95,21,6); run; proc print; run;

27 Results - Dunnett The GLM Procedure Dunnett's t Tests for gain
NOTE: This test controls the Type I experimentwise error for comparisons of all treatments againstba control. Alpha Error Degrees of Freedom Error Mean Square Critical Value of Dunnett's t Minimum Significant Difference Comparisons significant at the 0.05 level are indicated by ***. Difference Simultaneous g Between % Confidence Comparison Means Limits *** ***

28 ca is the 1-a quantile of the distribution of
maxi |Zi-Z0|/(2c2/df)1/2, called Dunnett’s two-sided range distribution.

29 Adjusted p-Values Definition: Adjusted p-value =
smallest FWE at which the hypothesis is rejected. or The FWE for which the confidence interval has “0” as a boundary.

30 Adjusted p-values for Dunnett
proc glm data=tox; class g; model gain=g; lsmeans g/adjust=dunnett pdiff; run;

31 Example: All Pairwise Comparisons
Goal: Estimate all mean differences and provide simultaneous 95% error margins: What ca to use?

32 Comparison of Critical Values
data; qval = probmc("RANGE",.,.95,21,7); c_alpha = qval/sqrt(2); run; proc print; run;

33 Tukey Comparisons Alpha= 0.05 df= 21 MSE= 210.0048
Critical Value of Studentized Range= 4.597 Minimum Significant Difference= Means with the same letter are not significantly different. Tukey Grouping Mean N G A A A A A A A A

34 Tukey Adjusted p-Values
General Linear Models Procedure Least Squares Means Adjustment for multiple comparisons: Tukey G GAIN Pr > |T| H0: LSMEAN(i)=LSMEAN(j) LSMEAN i/j

35 Tukey Simultaneous Intervals
Simultaneous Simultaneous Lower Difference Upper Confidence Between Confidence i j Limit Means Limit

36 ca is (1/Ö2) ´{the 1-a quantile
of the distribution of maxi,i’ |Zi-Zi’|/(c2/df)1/2}, which is called the Studentized range distribution.

37 Unbalanced Designs and/or Covariates
Tukey method is conservative when the design is unbalanced and/or there are covariates; otherwise exact Dunnett method is conservative when there are covariates; otherwise exact “Conservative” means {True FWE} < {Nominal FWE} ; also means “less powerful”

38 Tukey-Kramer Method for all pairwise comparisons
Let ca be the critical value for the balanced case using Tukey’s method and the correct df. Intervals are Conservative (Hayter, 1984 Annals)

39 Exact Method for General Comparisons of Means
Details: The model is Y = X \beta + \epsilon Linear functions are \theta_i = c_i’ \beta , i=1,…,k. (assumed estimable) The dispersion matrix is Corr(C_K’ (X’X)^{-1} C_K), where C_K = [c_1 : … : c_k ] (GINV can be used for non-full-rank parameterizations) df = n – rank(X)

40 Multivariate T-Distribution Details
40

41 Calculation of “Exact” ca
Edwards and Berry: Simple simulation Hsu and Nelson: Factor analytic control variate (better) Genz and Bretz: Integration using lattice methods (best) Even with simple simulation, the value ca can be obtained with reasonable precision. Edwards, D., and Berry, J. (1987) The efficiency of simulation-based multiple comparisons. Biometrics, 43, Hsu, J.C. and Nelson, B.L. (1998) Multiple comparisons in the general linear model. Journal of Computational and Graphical Statistics, 7, Genz, A. and Bretz, F. (1999), Numerical Computation of Multivariate t Probabilities with Application to Power Calculation of Multiple Constrasts, J. Stat. Comp. Simul. 63, pp

42 Example: ANCOVA with two covariates
Y = Diastolic BP Group = Therapy (Control, D1, D2, D3) X = Baseline Diastolic BP X = Baseline Systolic BP Goal: Compare all therapies, controlling for baseline proc glm data=research.bpr; class therapy; model dbp10 = therapy dbp7 sbp7; lsmeans therapy/pdiff cl adjust=simulate(nsamp= cvadjust seed= report); run; quit;

43 Results From ANCOVA Note: “4” is control
Source DF Type III SS Mean Square F Value Pr > F THERAPY DBP <.0001 SBP Least Squares Means for Effect THERAPY Difference Simultaneous 95% Between Confidence Limits for i j Means LSMean(i)-LSMean(j) Note: “4” is control

44 Details for Quantile Simulation
Random number seed Comparison type All Sample size Target alpha Accuracy radius (target) Accuracy radius (actual) E-7 Accuracy confidence % Simulation Results Estimated % Confidence Method % Quantile Alpha Limits Simulated Tukey-Kramer Bonferroni Sidak GT Scheffe T NOTE: PROCEDURE GLM used: real time seconds

45 Results from ANCOVA-Dunnett
H0:LSMean= Control THERAPY DBP10 LSMEAN Pr > |t| Dose Dose Dose Placebo Least Squares Means for Effect THERAPY Difference Simultaneous 95% Between Confidence Limits for i j Means LSMean(i)-LSMean(j) proc glm data=research.bpr; class therapy; model dbp10 = therapy dbp7 sbp7; lsmeans therapy/pdiff=control('Placebo') cl adjust=simulate(nsamp= cvadjust seed= report); run; quit;

46 Details for Quantile Simulation-Dunnett
Random number seed Comparison type Control, two-sided Sample size Target alpha Accuracy radius (target) Accuracy radius (actual) E-7 Accuracy confidence % Simulation Results Estimated % Confidence Method % Quantile Alpha Limits Simulated Dunnett-Hsu, two-sided Bonferroni Sidak GT Scheffe T NOTE: PROCEDURE GLM used: real time seconds

47 More General Inferences
Question: For what values of the covariate is treatment A better than treatment B?

48 Discussion of (Treatment ´ Covariate) Interaction Example

49 The GLIMMIX Procedure Computes MC-exact simultaneous confidence intervals and adjusted p-values for any set of linear functions in a linear model Documentation for GLIMMIX

50 GLIMMIX syntax proc glimmix data=research.tire; class make;
model cost = make mph make*mph; estimate "10" make 1 -1 make*mph , "15" make 1 -1 make*mph , "20" make 1 -1 make*mph , "25" make 1 -1 make*mph , "30" make 1 -1 make*mph , "35" make 1 -1 make*mph , "40" make 1 -1 make*mph , "45" make 1 -1 make*mph , "50" make 1 -1 make*mph , "55" make 1 -1 make*mph , "60" make 1 -1 make*mph , "65" make 1 -1 make*mph , "70" make 1 -1 make*mph /adjust=simulate(nsamp= report) cl; run; /* Program 6.12: Tire Wear Data */ data tire; input make$ mph cost datalines; A A A A A A A A A A B B B B B B B B B B ; run; /* Old %SimIntervals syntax */ %MakeGLMStats(dataset = tire , classvar = make , yvar = cost , model = make mph make*mph); %macro Contrasts; free c clab; do x = 10 to 70 by 5; c = c // (0 || 1 || -1 || 0 || x || -x); clab = clab // x ; end; c = c`; %mend; %SimIntervals(nsamp=100000, seed=121211); /* Complete GLIMMIX syntax */ ods output estimates=estimates simresults=simresults; proc glimmix data=research.tire; class make; model cost = make mph make*mph; estimate "10" make 1 -1 make*mph , "15" make 1 -1 make*mph , "20" make 1 -1 make*mph , "25" make 1 -1 make*mph , "30" make 1 -1 make*mph , "35" make 1 -1 make*mph , "40" make 1 -1 make*mph , "45" make 1 -1 make*mph , "50" make 1 -1 make*mph , "55" make 1 -1 make*mph , "60" make 1 -1 make*mph , "65" make 1 -1 make*mph , "70" make 1 -1 make*mph /adjust=simulate(nsamp= report) cl; /*Reshape output to plot with class var */ data reshape; set estimates; array v [3] estimate Adjlower Adjupper; do i = 1 to 3; if i = 1 then type= 'Estimate'; if i = 2 then type = 'AdjLower'; if i = 3 then type = 'AdjUpper'; cost = v[i] ; output; keep cost type label; /* The following code was mostly written by SAS/ANALYST */ title "Simultaneous Confidence Band for Difference between Makes A and B"; proc sort data=Work.Reshape out=WORK._stsrt_0; by LABEL; goptions ftext=SWISS ctext=BLACK htext=1 cells; axis1 width=1 offset=(3 pct) label=(a=90 r=0); axis2 width=1 offset=(3 pct); symbol1 v=SQUARE height=1 cells interpol=JOIN l=1 w=1 c=BLUE; symbol2 v=STAR height=1 cells interpol=JOIN l=1 w=1 c=BLUE; symbol3 v=CIRCLE height=1 cells interpol=JOIN l=1 w=1 c=BLUE; proc gplot data=WORK._stsrt_0 ; plot COST * LABEL = TYPE / name='SCAT' description="Scatter Plot of COST * LABEL" caxis = BLACK ctext = BLACK cframe = CXF7E1C2 hminor = 0 vminor = 0 vref = 0 vaxis = axis1 haxis = axis2 quit; goptions ftext= ctext= htext=; symbol1; axis1; data _null_; set simresults; if method='Simulated' then call symput("crit", put(quantile,6.3)); proc print data = estimates; title "Simultaneous intervals are Est +- &crit * S.E.(Est)"; var label estimate stderr tValue AdjLower AdjUpper;

51 Output from PROC GLIMMIX
Simultaneous intervals are Estimate * StdErr Label Estimate StdErr tValue AdjLower AdjUpper Bonferroni – critical value is t_{16,.05/2*13} = Bonferroni – critical value is t_{16,.05/2*13} =

52 Other Applications of Linear Combinations
Multiple Trend Tests (0,1,2,3), (0,1,2,4), (0,4,6,7) (carcinogenicity) (0,0,1), (0,1,1), (0,1,2) (recessive/dominant/ordinal genotype effects) Subgroup Analysis Subgroups define linear combinations (more on next slide)

53 Subgroup Analysis Example
Data: Yijkl , where i=Trt,Cntrl ; j=Old, Yng; k = GoodInit, PoorInit. Model: Yijkl = mijk + eijkl, where mijk=m+ai+bj+gk+(ab)ij+(ag)ik+(bg)jk Subgroup Contrasts: m m m m m m m m222 Overall ¼ ¼ ¼ ¼ ¼ ¼ ¼ ¼ Older ½ ½ ½ ½ Younger ½ ½ ½ ½ GoodInit ½ ½ ½ ½ PoorInit ½ ½ ½ ½ OldGood OldPoor YoungGood YoungPoor Need to use frequency-based weights in case of extreme imbalance – example in SAS book

54 Subgroup Analysis Results
Label Estimate StdErr tValue Probt Adjp AdjLower AdjUpper Overall I Older I Younger I GoodInitHealth I PoorInitHealth I OldGood I OldPoor I YoungGood I YoungPoor I ods output estimates=estimates_intervals; proc glimmix data=research.respiratory; class Treatment AgeGroup InitHealth; model score = Treatment AgeGroup InitHealth Treatment*AgeGroup Treatment*InitHealth AgeGroup*InitHealth; Estimate "Overall" treatment 4 -4 treatment*Agegroup treatment*InitHealth (divisor=4), "Older" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "Younger" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "GoodInitHealth" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "PoorInitHealth" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "OldGood" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "OldPoor" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "YoungGood" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "YoungPoor" treatment 1 -1 treatment*Agegroup treatment*InitHealth /adjust=simulate(nsamp= report seed=12321) upper cl; run; proc print data=estimates_intervals noobs; title "Subgroup Analysis Results"; var label estimate Stderr tvalue probt Adjp AdjLower AdjUpper; (SAS code available upon request)

55 Summary Include only comparisons of interest.
Utilize correlations to be less conservative. The critical values can be computed exactly only in balanced ANOVA for all pairwise comparisons, or in unbalanced ANOVA for comparisons with control. Simulation-based methods are “exact” if you let the computer run for a while. This is my general recommendation.

56 Power Analysis Sample size - Design of study
Power is less when you use multiple comparisons Þ larger sample sizes Many power definitions Bonferroni & independence are convenient (but conservative) starting points

57 Power Definitions “Complete Power” = P(Reject all H0i that are false)
“Minimal Power” = P(Reject at least one H0i that is false) “Individual Power” = P(Reject a particular H0i that is false) “Proportional Power” = Average proportion of false H0i that are rejected Complete = “All pairs” power Minimal = “Any pair” power Individual = “per pair” power Proportional = “Average” Power

58 Power Calculations. Example: H1 and H2 powered individually at 50%; H3 and H4 powered individually at 80%, all tests independent. Complete Power = P(reject H1 and H2 and H3 and H4) = .5 ´ .5 ´ .8 ´ .8 = Minimal Power = P(reject H1or H2 or H3 or H4) = 1-P(“accept” H1 and H2 and H3 and H4) =1- (1-.5) ´ (1 -.5) ´ (1-.8)´ (1-.8) = Individual Power = P(reject H3 (say)) = (depends on the test) Proportional Power = ( )/4 = 0.65

59 Sample Size for Adequate Individual Power - Conservative Estimate

60 Individual power of two-tail two-sample Bonferroni t-tests
%let MuDiff = 5; /* Smallest meaningful difference MUx-MUy that you want to detect */ %let Sigma = 10.0 ; /* A guess of the population std. dev. */ %let alpha = .05 ; /* Familywise Type I error probability of the test */ %let k = 4; /* Number of tests */ options ls=76; data power; cer = &alpha/&k; do n = 2 to 100 by 2; *n=sample size for each group*; df = n + n - 2; ncp = (&Mudiff)/(&Sigma*sqrt(2/n)); * The noncentrality parameter *; tcrit = tinv(1-cer/2, df); * The critical t value * ; power = 1 - probt(tcrit, df, ncp) + probt(-tcrit,df,ncp) ; output; end; proc print data=power; run; proc plot data=power; plot power*n/vpos=30;

61 Graph of Power Function
Plot of power*n. Legend: A = 1 obs, B = 2 obs, etc. power ‚ 1.0 ˆ ‚ AAA 0.8 ˆ AAAA ‚ AAA ‚ AAA ‚ AAA ‚ AA 0.6 ˆ AAA n=92 for 80% ‚ AA power ‚ AA ‚ AA ‚ AA 0.4 ˆ A ‚ AA ‚ AA ‚ AA ‚ AA 0.2 ˆ AA ‚ AA ‚ AA ‚ AA ‚ AAA 0.0 ˆ A Šƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒˆƒƒ n

62 %IndividualPower macro*
Uses PROBMC and PROBT (noncentral) Assumes that you want to use the single-step (confidence interval based) Dunnett (one- or two-sided) or Range (two-sided) test Less conservative than Bonferroni Conservative compared to stepwise procedures %IndividualPower(MCP=DUNNETT2,g=4,d=5,s=10); *Westfall et al (1999), Multiple Comparisons and Multiple Tests Using SAS

63 %IndividualPower Output

64 More general Power- Simulate!
Invocation: %SimPower(method = dunnett , TrueMeans = (10, 10, 13, 15, 15) , s = , n = , seed= ); Output: Method=DUNNETT, Nominal FWE=0.05, nrep=1000 True means = (10, 10, 13, 15, 15), n=87, s=10 Quantity Estimate % CI---- Complete Power (0.260,0.316) Minimal Power (0.913,0.945) Proportional Power (0.633,0.669) True FWE (0.011,0.027) Directional FWE (0.011,0.027)

65 Concluding Remarks - Power
Need a bigger n Like to avoid bigger n (see sequential, gatekeepers methods later) Which definition? Bonferroni and independence useful Simulation useful – especially for the more complex methods that follow

66 Closed and Stepwise Testing Methods I: Standard P-Value Based Methods
…then use If you want ... Estimates of effect sizes & error margins Confident inequalities Overall Test Simultaneous Confidence Intervals Stepwise or closed tests Holm’s Method Hommel’s Method Hochberg’s Method Fisher Combination Method F-test, O’Brien, etc.

67 Closed Testing Method(s)
Form the closure of the family by including all intersection hypotheses. Test every member of the closed family by a (suitable) a-level test. (Here, a refers to comparison-wise error rate). A hypothesis can be rejected provided that its corresponding test is significant at level a, and every other hypothesis in the family that implies it is rejected by its a-level test.

68 Closed Testing – Multiple Endpoints
H0: d1=d2=d3=d4 =0 H0: d1=d2=d3 =0 H0: d1=d2=d4 =0 H0: d1=d3=d4 =0 H0: d2=d3=d4 =0 H0: d1=d2 =0 H0: d1=d3 =0 H0: d1=d4 =0 H0: d2=d3 =0 H0: d2=d4 =0 H0: d3=d4 =0 H0: d1=0 p = H0: d2=0 p = H0: d3=0 p = H0: d4=0 p = Where dj = mean difference, treatment -control, endpoint j.

69 Closed Testing – Multiple Comparisons
m1=m2=m3=m4 m1=m2, m3=m4 m1=m3, m2=m4 m1=m4, m2=m3 m1=m2=m3 m1=m2=m4 m1=m3=m4 m2=m3=m4 m1=m2 m1=m3 m1=m4 m2=m3 m2=m4 m3=m4 Note: Logical implications imply that there are only 14 nodes, not = 63 nodes.

70 Control of FWE with Closed Tests
Suppose H0j1,..., H0jm all are true (unknown to you which ones). {Reject at least one of H0j1,..., H0jmusing CTP}  {Reject H0j1Ç... Ç H0jm } Thus, P(reject at least one of H0j1,..., H0jm | H0j1,..., H0jm all are true) £ P(reject H0j1Ç... Ç H0jm | H0j1,..., H0jm all are true) = a Eg, H2, H4 true. P(reject H2 or H4). Using closure, you must reject H2 \cap H4 to *have an opportunity* to reject H2 or H4. So the set of studies in which H2 or H4 is rejected is smaller. Draw a Venn diagram: A = (Reject H2 \cap H4). B= (Reject either H2 or H4 using closure) Clearly, B is a subset of A. Therefore, the probability of B is less than the probability of A.

71 Examples of Closed Testing Methods
When the Composite Test is… Then the Closed Method is … Bonferroni MinP Resampling-Based MinP Simes O’Brien Simple or weighted test Holm’s Method Westfall-Young method Hommel’s method Lehmacher’s method Fixed sequence test (a-priori ordered)

72 P-value Based Methods Test global hypotheses using p-value combination tests Benefit – Fewer model assumptions: only need to say that the p-values are valid Allows for models other than homoscesdastic normal linear models (like survival analysis).

73 Holm’s Method is Closed Testing Using the Bonferroni MinP Test
Reject H0j1 ÇH0j2Ç... Ç H0jm if Min (p0j1 , p0j2 ,... , p0jm ) £ a/m. Or, Reject H0j1 ÇH0j2Ç... Ç H0jm if p* = m ´ Min (p0j1 , p0j2 ,... , p0jm ) £ a. (Note that p* is a valid p-value for the joint null, comparable to p-value for Hotellings T2 test.)

74 Holm’s Stepdown Method
H0: d1=d2=d3=d4 =0 minp=0.0121 p*=0.0484 H0: d1=d3=d4 =0 minp=.0121 p*=0.0363 H0: d2=d3=d4 =0 minp=0.0142 p*=0.0426 H0: d1=d2=d3 =0 minp=0.0121 p*=0.0363 H0: d1=d2=d4 =0 minp=0.0121 p*=0.0363 H0: d1=d2 =0 minp=0.0121 p*=0.0242 H0: d1=d3 =0 minp=0.0121 p*=0.0242 H0: d1=d4 =0 minp=0.0121 p*=0.0242 H0: d2=d3 =0 minp=0.0142 p*=0.0284 H0: d2=d4 =0 minp=0.0142 p*=0.0284 H0: d3=d4 =0 minp=0.0191 p*=0.0382 H0: d1=0 p = H0: d2=0 p = H0: d3=0 p = H0: d4=0 p = Where dj = mean difference, treatment -control, endpoint j.

75 Shortcut For Holm’s Method
Let H(1) ,…,H(k) be the hypotheses corresponding to p(1) £ … £ p(k) If p(1) £ a/k, reject H(1) and continue, else stop and retain all H(1) ,…,H(k) . If p(2) £ a/(k-1), reject H(2) and continue, else stop and retain all H(1) ,…,H(k) . If p(k) £ a, reject H(k) Discuss

76 Adjusted p-values for Closed Tests
The adjusted p-value for H0j is the maximum of all p-values over all relevant nodes In the previous example, pA(1)=0.0484,pA(2)=0.0484, pA(3)=0.0484, pA(4)= General formula for Holm: pA(j)= maxi£j (k-i+1)p(i) .

77 Worksheet For Holm’s Method

78 Simes’ Test for Global Hypotheses
Uses all p-values p1, p2, …, pm not just the MinP Simes’ test rejects H01ÇH02Ç...ÇH0m if p(j) £ ja/m for at least one j. Þ p-value for the joint test is p* = min {(m/j)p(j)} Uniformly smaller p-value than m ´ MinP Type I error at most a under independence or positive dependence of p-values

79 Rejection Regions p2 1 a a/2 1 a/2 a p1 P(Simes Reject) = 1 – (1- a/2)2 + (a/2)2 = a P(Bonferroni Reject ) = 1 – (1- a/2)2 = a - (a/2)2

80 Hommel’s Method (Closed Simes)
H0: d1=d2=d3=d4 =0 p*=0.0255 H0: d1=d2=d3 =0 p*=0.0213 H0: d1=d2=d4 =0 p*=0.0191 H0: d1=d3=d4 =0 p*=0.0287 H0: d2=d3=d4 =0 p*=0.0287 H0: d1=d2 =0 p*=0.0142 H0: d1=d3 =0 p*=0.0242 H0: d1=d4 =0 p*=0.0191 H0: d2=d3 =0 p*=0.0284 H0: d2=d4 =0 p*=0.0191 H0: d3=d4 =0 p*=0.0382 H0: d1=0 p = H0: d2=0 p = H0: d3=0 p = H0: d4=0 p = Where dj = mean difference, treatment -control, endpoint j.

81 Adjusted P-values for Hommel’s Method
Again, take the maximum p-value over all hypotheses that imply the given one. In the previous example, the Hommel adjusted p-values are pA(1)=0.0287, pA(2)=0.0287, pA(3)=0.0382, pA(4)= These adjusted p-values are always smaller than the Holm step-down adjusted p-values.

82 Adjusted P-values for Hommel’s Method
They are maxima over relevant nodes In example, Hommel adjusted p-values are pA(1)=0.0287, pA(2)=0.0287, pA(3)=0.0382, pA(4)= {Hommel adjusted p-value} ≤ {Holm adjusted p-value}

83 Hochberg’s Method A conservative but simpler approximation to Hommel’s method {Hommel adjusted p-value} ≤ {Hochberg adjusted p-value} ≤ {Holm adjusted p-value}

84 Hochberg’s Shortcut Method
Let H(1) ,…,H(k) be the hypotheses corresponding to p(1) £ … £ p(k) If p(k) £ a, reject all H(j) and stop, else retain H(k) and continue. If p(k-1) £ a/2, reject H(2) … H(k) and stop, else retain H(k-1) and continue. If p(1) £ a/k, reject H(k) Adjusted p-values are pA(j)= minj£i (k-i+1)p(i) .

85 Worksheet for Hochberg’s Method

86 Comparison of Adjusted P-Values
Stepdown Test Raw Bonferroni Hochberg Hommel data pvals; input test$ raw_p datalines; ; PROC MULTTEST pdata=pvals holm hoc hommel out=stat; run; proc print data=stat; run;

87 Fisher Combination Test for Independent p-Values
Reject H01ÇH02Ç...ÇH0m if -2Sln(pi) > c2(1-a, 2m)

88 Example: Non-Overlapping Subgroup* p-values
The Multtest Procedure p-Values Stepdown Fisher Test Raw Bonferroni Hochberg Hommel Combination /* Conversely, show how bad things can be for Fish comb with other types of p-values */ data pvals; input test$ raw_p datalines; ; PROC MULTTEST pdata=pvals holm hoc hommel fisher_c; run; *Non-overlapping is required by the independence assumption.

89 Power Comparison Liptak test stat: T = S F-1(pi) = S Zi

90 Concluding Notes Closed testing more powerful than single-step (a/m rather than a/k). P-value based methods can be used whenever p-values are valid Dependence issues: MinP (Holm) conservative Simes (Hommel, Hochberg) less conservative, rarely anti-conservative Fisher combination, Liptak require independence

91 Closed and Stepwise Testing Methods II: Fixed Sequences and Gatekeepers
Methods Covered: Fixed Sequences (hierarchical endpoints, dose response, non-inferiority superiority) Gatekeepers (primary and secondary analyses) Multiple Gatekeepers (multiple endpoints & multiple doses) Intersection-Union tests* * Doesn’t really belong in this section

92 Fixed Sequence Tests Pre-specify H1, H2, …, Hk, and test in this sequence, stopping as soon as you fail to reject. No a-adjustment is necessary for individual tests. Applications: Dose response: High vs. Control, then Mid vs. Control, then Low vs. Control Primary endpoint, then Secondary endpoint

93 Fixed Sequence as a Closed Procedure
H123: d1=d2=d3 =0 Rej if p1 £.05 H12: d1=d2=0 Rej if p1 £.05 H13: d1=d3 =0 Rej if p1 £.05 H23: d2=d3 =0 Rej if p2 £.05 H1: d1=0 Rej if p1 £.05 H2: d2 =0 Rej if p2 £.05 H3: d3 =0 Rej if p3 £.05 Rej H1 if p1£.05 Rej H2 if p1£.05 and p2£.05 Rej H3 if p1£.05 and p2£.05 and p3£.05

94 A Seemingly Reasonable But Incorrect Protocol
1. Test Dose 2 vs Pbo, and Dose 3 vs Pbo using the Bonferroni method (0.025 level). 2. Test Dose 1 vs Pbo at the unadjusted 0.05 level only if at least one of the first two tests is significant at the level.

95 The problem: FWE » 0.075 Moral: Caution needed when there are multiple hypotheses at some point in the sequence.

96 Correcting the Incorrect Protocol: Use Closure
Where pij = 2min(pi,pj)

97 References –Fixed Sequence and Gatekeeper Tests
Bauer, P (1991) Multiple Testing in Clinical Trials, Statistics in Medicine, 10, O’Neill RT. (1997) Secondary endpoints cannot be validly analyzed if the primary endpoint does not demonstrate clear statistical significance. Controlled Clinical Trials; 18:550 –556. D’Agostino RB. (2000) Controlling alpha in clinical trials: the case for secondary endpoints. Statistics in Medicine; 19:763–766. Chi GYH. (1998) Multiple testings: multiple comparisons and multiple endpoints. Drug Information Journal 32:1347S–1362S. Bauer P, Röhmel J, Maurer W, Hothorn L. (1998) Testing strategies in multi-dose experiments including active control. Statistics in Medicine; 17:2133 –2146. Westfall, P.H. and Krishen, A. (2001). Optimally weighted, fixed sequence, and gatekeeping multiple testing procedures, Journal of Statistical Planning and Inference 99, Chi, G. “Clinical Benefits, Decision Rules, and Multiple Inferences,” Dmitrienko, A, Offen, W. and Westfall, P. (2003). Gatekeeping strategies for clinical trials that do not require all effects to be significant. Stat Med. 22: Chen X, Luo X, Capizzi T. (2005) The application of enhanced parallel gatekeeping strategies. Stat Med. 24: Alex Dmitrienko, Geert Molenberghs, Christy Chuang-Stein, and Walter Offen (2005), Analysis of Clinical Trials Using SAS: A Practical Guide, SAS Press. Wiens, B, and Dmitrienko, A. (2005). The fallback procedure for evaluating a single family of hypotheses. J Biopharm Stat.15(6): Dmitrienko, A., Wiens, B. and Westfall, P. (2006). Fallback Tests in Dose Response Clinical Trials, J Biopharm Stat, 16,

98 Intersection-Union (IU) Tests
Union-Intersection (UI): Nulls are intersections, alternatives are unions. H0: {d1=0 and d2=0} vs. H1: {d1¹0 or d2¹0} Intersection-Union (IU): Nulls are unions, alternatives are intersections H0: {d1=0 or d2=0} vs. H1: {d1¹0 and d2¹0} IU is NOT a closed procedure. It is just a single test of a different kind of null hypothesis.

99 Applications of I-U Bioequivalence: The “TOST” test:
Test 1. H01: d £ -d0 vs. HA1: d > -d0 Test 2. H01: d ³ d0 vs. HA1: d < d0 Can test both at a=.05, but must reject both. Combination Therapy: Test 1. H01: m12 £ m1 vs. HA1: m12 > m1 Test 2. H01: m12 £ m2 vs. HA1: m12 > m2

100 Control of Type I Error for IU tests
Suppose d1=0 or d2=0. Then P(Type I error) = P(Reject H0) (1) = P(p1£.05 and p2£.05) (2) < min{P(p1£.05), P(p2£.05)} (3) = (4) Note: The inequality at (3) becomes an approximate equality when p2 is extremely noncentral.

101 Concluding Notes: Fixed Sequences and Gatekeepers
Many times, no adjustment is necessary at all! Other times you can gain power by specifying gatekeeping sequences However, you must clearly state the method and follow the rules There are many “incorrect” no adjustment methods - use caution

102 Closed and Stepwise Testing Methods III: Methods that Use Logical Constraints and Correlations
Methods Application Lehmacher et al Multiple endpoints Westfall-Tobias- Shaffer-Royen General contrasts

103 Lehmacher et al. Method Use O’Brien test at each node (incorporates correlations) Do closed testing Note: Possibly no adjustment whatsoever; possibly big adjustment

104 Calculations for Lehmacher’s Method
proc standard data=research.multend1 mean=0 std=1 out=stdzd; var Endpoint1-Endpoint4; run; data combine; set stdzd; H1234 = Endpoint1+Endpoint2+Endpoint3+Endpoint4; H123 = Endpoint1+Endpoint2+Endpoint ; H124 = Endpoint1+Endpoint Endpoint4; H134 = Endpoint Endpoint3+Endpoint4; H234 = Endpoint2+Endpoint3+Endpoint4; H12 = Endpoint1+Endpoint ; H13 = Endpoint Endpoint ; H14 = Endpoint Endpoint4; H23 = Endpoint2+Endpoint ; H24 = Endpoint Endpoint4; H34 = Endpoint3+Endpoint4; H1 = Endpoint ; H2 = Endpoint ; H3 = Endpoint ; H4 = Endpoint4; run; proc ttest; class treatment; var H1234 H123 H124 H134 H234 H12 H13 H14 H23 H24 H34 H1 H2 H3 H4 ; ods output ttests=ttests; run; data MultipleEndpoints; Treatment = 'Placebo'; do Subject = 1 to 54; input Endpoint1-Endpoint4 output; end; Treatment = 'Drug'; do Subject = 54+1 to 54+57; datalines; ; data multend1; set MultipleEndpoints; Endpoint4 = -Endpoint4; run; proc standard data=multend1 mean=0 std=1 out=stdzd; var Endpoint1-Endpoint4; run; data combine; set stdzd; H1234 = Endpoint1+Endpoint2+Endpoint3+Endpoint4; H123 = Endpoint1+Endpoint2+Endpoint ; H124 = Endpoint1+Endpoint Endpoint4; H134 = Endpoint Endpoint3+Endpoint4; H234 = Endpoint2+Endpoint3+Endpoint4; H12 = Endpoint1+Endpoint ; H13 = Endpoint Endpoint ; H14 = Endpoint Endpoint4; H23 = Endpoint2+Endpoint ; H24 = Endpoint Endpoint4; H34 = Endpoint3+Endpoint4; H1 = Endpoint ; H2 = Endpoint ; H3 = Endpoint ; H4 = Endpoint4; proc ttest; class treatment; var H1234 H123 H124 H134 H234 H12 H13 H14 H23 H24 H34 H1 H2 H3 H4 ; ods output ttests=ttests; proc print data=ttests; where method='Pooled';

105 Output For Lehmacher’s Method
Obs Variable Method Variances tValue DF Probt 1 H Pooled Equal 3 H Pooled Equal 5 H Pooled Equal 7 H Pooled Equal 9 H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal H Pooled Equal pA1 = max(0.0121, , , , , , , ) = pA2 = max(0.0142, , , , , , , ) = pA3 = max(0.1986, , , , , , , ) = pA4 = max(0.0191, , , , , , , ) =

106 Free and Restricted Combinations
If truth of some null hypotheses logically forces other nulls to be true, the hypotheses are restricted. Examples Multiple Endpoints, one test per endpoint - free All Pairwise Comparisons - restricted

107 Pairwise Comparisons, 3 Groups
H0: m1=m2=m3 H0: m1=m3,m2=m3 H0: m1=m2,m1=m3 H0: m1=m2,m2=m3 H0: m1=m2 H0: m1=m3 H0: m2=m3 Note : The entire middle layer is not needed!!!!! Fisher protected LSD valid!

108 Pairwise Comparisons, 4 Groups
m1=m2=m3=m4 m1=m2, m3=m4 m1=m3, m2=m4 m1=m4, m2=m3 m1=m2=m3 m1=m2=m4 m1=m3=m4 m2=m3=m4 m1=m2 m1=m3 m1=m4 m2=m3 m2=m4 m3=m4 Note: Logical implications imply that there are only 14 nodes, not = 63 nodes. Also, Fisher protected LSD not valid.

109 Restricted Combinations Multipliers (Shaffer* Method 1; Modified Holm)
*Shaffer, J.P. (1986). Modified sequentially rejective multiple test procedures. JASA 81, 826—831.

110 Shaffer’s (1) Adjusted p-values

111 Westfall/Tobias/Shaffer/Royen* Method
Uses actual distribution of MinP instead of conservative Bonferroni approximation Closed testing incorporating logical constraints Hard-coded in PROC GLIMMIX Allows arbitrary linear functions *Westfall, P.H. and Tobias, R.D. (2007). Multiple Testing of General Contrasts: Truncated Closure and the Extended Shaffer-Royen Method, Journal of the American Statistical Association 102:

112 Application of Truncated Closed MinP to Subgroup Analysis
Compare Treatment with control as follows: Overall In the Older Patients subgroup In the Younger Patients subgroup In patients with better initial health subgroup In patients with poorer initial health subgroup In each of the four (old/young)x(better/poorer) subgroups 9 tests overall (but better 1 gatekeeper + 8 follow-up)

113 Analysis File ods output estimates=estimates_logicaltests;
proc glimmix data=research.respiratory; class Treatment AgeGroup InitHealth; model score = Treatment AgeGroup InitHealth Treatment*AgeGroup Treatment*InitHealth AgeGroup*InitHealth; Estimate "Overall" treatment 4 -4 treatment*Agegroup treatment*InitHealth (divisor=4), "Older" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "Younger" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "GoodInitHealth" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "PoorInitHealth" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "OldGood" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "OldPoor" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "YoungGood" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "YoungPoor" treatment 1 -1 treatment*Agegroup treatment*InitHealth /adjust=simulate(nsamp= report seed=12321) upper stepdown(type=logical report); run; proc print data=estimates_logicaltests noobs; title "Subgroup Analysis Results – Truncated Closure"; var label estimate Stderr tvalue probt Adjp; /* Old SimTests code */ %MakeGLMStats(dataset=respiratory, classvar=Treatment AgeGroup InitHealth, yvar=score, model = Treatment AgeGroup InitHealth Treatment*AgeGroup Treatment*InitHealth AgeGroup*InitHealth); %macro Contrasts; C1 = { }; C1 = C1/4; C2 = { , , , }; C2 = C2/2; C3 = { , , , }; C = C1//C2//C3; C=C`; Clab = {"Overall","Older","Younger","Good","Poor", "OldGood","OldPoor","YoungGood","YoungPoor"}; %mend; %SimTests(nsamp=200000, seed=121211, type=LOGICAL, side=U);

114 Results – Truncated Closure
Subgroup Analysis Results adjp_ adjp_ Label Estimate StdErr tValue Probt logical interval Overall Older Younger GoodInitHealth PoorInitHealth OldGood OldPoor YoungGood YoungPoor ods output estimates=estimates_logicaltests; proc glimmix data=research.respiratory; class Treatment AgeGroup InitHealth; model score = Treatment AgeGroup InitHealth Treatment*AgeGroup Treatment*InitHealth AgeGroup*InitHealth; Estimate "Overall" treatment 4 -4 treatment*Agegroup treatment*InitHealth (divisor=4), "Older" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "Younger" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "GoodInitHealth" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "PoorInitHealth" treatment 2 -2 treatment*Agegroup treatment*InitHealth (divisor=2), "OldGood" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "OldPoor" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "YoungGood" treatment 1 -1 treatment*Agegroup treatment*InitHealth , "YoungPoor" treatment 1 -1 treatment*Agegroup treatment*InitHealth /adjust=simulate(nsamp= report seed=12321) upper stepdown(type=logical report); run; proc print data=estimates_logicaltests noobs; title "Subgroup Analysis Results – Truncated Closure"; var label estimate Stderr tvalue probt Adjp; data combined; merge estimates_intervals(rename=(adjp=adjp_interval)) estimates_logicaltests(rename=(adjp=adjp_logical)); proc print data=combined noobs; title "Subgroup Analysis Results"; var label estimate Stderr tvalue probt Adjp_logical Adjp_interval; The adjusted p-values for the stepdown tests are mathematically smaller than those of the simultaneous interval-based tests,

115 Example: Stepwise Pairwise vs. Control Testing
Teratology data set Observations are litters Response variable = litter weight Treatments: 0,5,50,500. Covariates: Litter size, Gestation time

116 Analysis File proc glimmix data=research.litter; class dose;
model weight = dose gesttime number; estimate "5 vs 0" dose , "50 vs 0" dose , "500 vs 0" dose / adjust=simulate(nsample= report) stepdown(type=logical); run; quit; /* Old code */ %MakeGLMStats(dataset=litter, yvar = weight, classvar = dose, model=dose gesttime number, contrasts=control(dose(1)) ); %SimTests(seed=12111, nsamp=100000);

117 Results Estimates with Simulated Adjustment Standard Label Estimate Error DF t Value Pr > |t| Adj P 5 vs 50 vs 500 vs proc glimmix data=research.litter; class dose; model weight = dose gesttime number; estimate "5 vs 0" dose , "50 vs 0" dose , "500 vs 0" dose / adjust=simulate(nsample= report) stepdown(type=logical); run; quit; /* estimate / adjust=simulate(nsamp= ); */ lsmeans dose/adjust=dunnett; Note: and not significant at .10 with regular Dunnett

118 Concluding Notes: More power is available when combinations are restricted. Power of closed tests can be improved using correlation and other distributional characteristics

119 Nonparametric Multiple Testing Methods
Overview: Use nonparametric tests at each node of the closure tree Bootstrap tests Rank-based tests Tests for binary data

120 Bootstrap MinP Test (Semi-Parametric Test)
The composite hypothesis H1ÇH2Ç…ÇHk may be tested using the p-value p* = P(MinP £ minp | H1ÇH2Ç…ÇHk) Westfall and Young (1993) show how to obtain p* by bootstrapping the residuals in a multivariate regression model. how to obtain all p*’s in the closure tree efficiently

121 Multivariate Regression Model
(Next Five slides are from Westfall and Young, 1993)

122 Hypotheses and Test Statistics

123 Joint Distribution of the Test Statistics

124 Testing Subset Intersection Hypotheses Using the Extreme Pivotals

125 Exact Calculation of pK
Bootstrap Approximation:

126 Bootstrap Tests (PROC MULTTEST)
H0: d1=d2=d3=d4 =0 min p = .0121, p* = .0379 H0: d1=d3=d4 =0 min p = .0121, p* < .0379 H0: d2=d3=d4 =0 min p = .0142, p* = .0351 H0: d1=d2=d3 =0 min p = .0121, p* < .0379 H0: d1=d2=d4 =0 min p = .0121, p* < .0379 H0: d1=d2 =0 minp = .0121 p* < .0379 H0: d1=d3 =0 minp = .0121 p* < .0379 H0: d3=d4 =0 minp = .0191 p* = .0355 H0: d1=d4 =0 minp =.0121 p* < .0379 H0: d2=d3 =0 minp = .0142 p* < .0351 H0: d2=d4 =0 minp = .0142 p* < .0351 proc multtest data=research.panic seed= stepboot n=100000; class tx; test mean(AASEVO PANTOTO PASEVO PHCGIMPO); contrast "t vs c" -1 1; run; H0: d4=0 p = p* < .0355 H0: d1=0 p = p* < .0379 H0: d2=0 p = p* < .0351 H0: d3=0 p = p* = .1991 p* = P(Min P £ min p | H0) (computed using bootstrap resampling) (Recall, for Bonferroni, p* = k(MinP) )

127 Permutation Tests for Composite Hypotheses H0K
Joint p-value = proportion of the n!/(nT!nC!) permutations for which miniÎK Pi* £ miniÎK pi .

128 Problem; Simplification
Problem: There are 2k -1 subsets K to be tested This might take a while... Simplification: You need only test k of the 2k-1 subsets! Why? Because P(miniÎK Pi* £ c) £ P(miniÎK’ Pi* £ c) when KÌ K’. Significance for most lower order subsets is determined by significance of higher order subsets.

129 MULTTEST PROCEDURE Tests only the needed subsets (k, not 2k - 1).
Samples from the permutation distribution. Only one sample is needed, not k distinct samples, if the joint distribution of minP is identical under HK and HS. (Called the “subset pivotality” condition by Westfall and Young, 1993, valid under location shift and other models)

130 Great Savings are Possible with Exact Permutation Tests!
Why? Suppose you test H12…k using MinP. The joint p-value is p* = P(MinP £ minp) £ P(P1 £ minp) + P(P2 £ minp) + … + P(Pk £ minp) Many summands can be zero, others much less than minp.

131 Variable Contrast Raw Bonferroni Permutation
Multiple Binary Adverse Events Stepdown Stepdown Variable Contrast Raw Bonferroni Permutation ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c ae t vs c proc multtest data=research.ae pvals n=200000 stepperm stepbon; class g; test ca(ae1-ae28/upper permutation=40); contrast "t vs c" 0 1; run;

132 Example: Genetic Associatons
Phenotype: 0/1 (diseased or not). Sample n1 from diseased, n2 from not diseased. Compare 100’s of genotype frequencies (using dominant and recessive codings) for diseased and non-diseased using multiple Fisher exact tests.

133 PROC MULTTEST Code proc multtest data=research.gen stepperm n=20000
out=pval hommel fdr; class y; test fisher(d1-d100 r1-r100); contrast "dis v nondis" -1 1; run; proc sort data=pval; by raw_p; proc print data=pval; var _var_ raw_p stppermp hom_p; where raw_p <.05;

134 Results from PROC MULTTEST
Obs _var_ raw_p stppermp hom_p fdr_p 1 r 2 r 3 d 4 r 5 d 6 r 7 r 8 r

135 Application - Gene Expression
Group 1: Acute Myeloid Leukemia (AML), n1=11 Group 2: Acute Lymphoblastic Leukemia (ALL), n2=27 Data: OBS TYPE G1 G2 G3 … G7000 1 AML (Gene expression levels) 2 AML … … … … 11 AML 12 ALL … … 38 ALL

136 PROC MULTTEST code for exact* closed testing
Proc multtest data=research.leuk noprint out=adjp holm fdr stepperm n=1000; class type; /* AML or ALL */ test mean (gene1-gene7123); contrast 'AML vs ALL' -1 1; run; proc sort data=adjp(where=(raw_p le .0005)); by raw_p; proc print; var _var_ raw_p stpbon_p fdr_p stppermp; * modulo Monte Carlo error

137 PROC MULTTEST Output (1 hour on 2.8 GhZ Xeon for 200,000 samples)

138 Subset Pivotality, PROC MULTTEST
MULTTEST requires “subset pivotality” condition, which states cases where resampling under the global null is valid. Valid cases: Multivariate Regression Model (location-shift). Multivariate permutation multiple comparisons, one test per variable, assuming model with exchangeable subsets. Not Valid with: Permutation multiple comparisons, within a variable, with three or more groups, Heteroscedasticity. Closed testing “by hand” works regardless.

139 Summary: Nonparametric Closed Tests
Nonparametric closed tests are simple, in principle. Robustness gains and power advantages are possible.

140 Further Topics: More Complex Situations for FWE Control
Heteroscedasticity Repeated Measures Large Sample Methods

141 Heteroscedasticity in MCPs
Extreme Example: data het; do g = 1 to 5; do rep = 1 to 10; input y output; end; datalines; ; proc glm; class g; model y = g; lsmeans g/adjust=tukey pdiff; run; quit;

142 Bad Results from Heteroscedastic Data
Level of y g N Mean Std Dev RMSE = 6.17 Least Squares Means for effect g Pr > |t| for H0: LSMean(i)=LSMean(j) Adjustment for Multiple Comparisons: Tukey-Kramer i/j < <.0001 < <.0001 < <.0001 < < < < < <

143 Approximate Solution for Heteroscedasticity Problem
proc glimmix data=het;    if (g > 3) then y2=y/20; else y2=y; /* overcomes scaling problem */    class g;    model y2 = g/noint ddfm=satterth;    random _residual_ / group=g ;    estimate '1 -2' g 1 -1  0   0    0  , '1 -3' g 1  0 -1   0    0  , '1 -4' g 1  0  0 -20    0  , '1 -5' g 1 -1  0   0  -20  , '2 -3' g 0  1 -1   0    0  , '2 -4' g 0  1  0 -20    0  , '2 -5' g 0  1  0   0  -20  , '3 -4' g 0  0  1 -20    0  , '3 -5' g 0  0  1   0  -20  , '4 -5' g 0  0  0  20  -20  /adjust=simulate(nsamp = ) stepdown(type=logical) adjdfe=row; run; /* MCMT (1999) solution using macro code */ /* define all pairwise contrasts*/ %MakeGLMStats(dataset=het, classvar=g, yvar=y, model=g, contrasts=all(g)); /* calculate and output ests, cov matrix and Satterthwaite dendf from heteroscedastic fit */ ods output Tests3=Tests3 SolutionF=SolutionF CovB=CovB; proc mixed data=het method=ml; class g; model y = g/ddfm=satterth solution covb; repeated/group=g; run; /* Prepare to use %SimTests */ %macro Estimates; use Tests3; read all var {DenDf} into df; use CovB; read all var ("Col1":"Col6") into cov; use SolutionF; read all var {Estimate} into EstPar; %mend; %SimTests(nsamp=100000, seed=121211, type=LOGICAL);

144 Heteroscedastic Results
Estimates with Simulated Adjustment Standard Label Estimate Error DF t Value Pr > |t| Adj P < < < Notes: Approximation 1: df’s Approximation 2: Covariance matrix involving all comparisons is approximate 1,2,3 different, 4-5 not. (sensible)

145 Repeated Measures and Multiple Comparisons
Usually considered quite complicated (wave hands, use Bonferroni) PROC GLIMMIXED provides a viable solution The method is approximate because of its df approximation, and because it treats estimated variance ratios as known.

146 Multiple Comparisons with Mixed Model
Crossover study: Dog heart rates data Halothane; do Dog =1 to 19; do Treatment = 'HA','LA','HP','LP'; input Rate output; end; datalines; ; H,L = CO2 High/Low A,P = Halothane absent/present Source: Johnson and Wichern, Applied Multivariate Statistical Analysis, 5th ed, Prentice Hall

147 GLIMMIX code for analyzing all pairwise comparisons, main effects, and interactions simultaneously
proc glimmix data=halothane order=data; class treatment dog; model rate = treatment/ddfm=satterth; random treatment/ subject=dog type=chol v=1 vcorr=1; estimate 'HA - LA' treatment , 'HA - HP' treatment , 'HA - LP' treatment , 'LA - HP' treatment , 'LA - LP' treatment , 'HP - LP' treatment , 'Co2 ' treatment (divisor=2), 'Halothane' treatment (divisor=2), 'Interaction' treatment /adjust=simulate(nsamp= ) stepdown(type=logical) adjdfe=row; /* Old %SimTests code from MCMT */ proc mixed data=Halothane; class Dog Treatment; model Rate = Treatment / ddfm=satterth; repeated / type=un subject=Dog; lsmeans Treatment / cov; ods output LSmeans = LSmeans; ods output Tests3 = Tests3; run; %macro Contrasts; C = { , , , , , , , , }; C=C`; Clab = {"HA-HP","HA-LA","HA-LP", "HP-LA","HP-LP", "LA-LP", "Halo" , "CO2" , "Interaction" }; %mend; %macro Estimates; use tests3; read all var {DenDf} into df; use lsmeans; read all var {Cov1 Cov2 Cov3 Cov4} into cov; read all var {Estimate} into EstPar; %mend; %SimTests(nsamp=40000, seed=121211, type=LOGICAL);

148 Results Estimates with Simulated Adjustment Standard
Label Estimate Error DF t Value Pr > |t| Adj P HA - LA HA - HP < <.0001 HA - LP < <.0001 LA - HP < LA - LP < <.0001 HP - LP Co Halothane < <.0001 Interaction

149 Cure Rates Example: Multiple Comparisons of Odds
Questions: (1) Multiple comparisons of cure rates for the Treatments (3 comparisons) (2) Comparison of cure rates for Complicated vs Uncomplicated Diagnosis.

150 Method Use the estimated parameter vector and associated estimate
of covariance matrix from PROC GLIMMIX Treat the estimated (asymptotic) covariance matrix as known Simulate critical values and p-values (MinP-based) from the multivariate normal distribution instead of the Multivariate T distribution Controls FWE asymptotically under correct logit model

151 Results Estimates with Simulated Adjustment Standard
Label Estimate Error DF t Value Pr > |t| Adj P A-B Infty A-C Infty B-C Infty < <.0001 Comp-Uncomp Infty data uti; format diagnosis $13.; do Diagnosis = "complicated", "uncomplicated"; do treatment = "A", "B", "C"; input cured total output; end; datalines; ; proc glimmix data=uti; class Diagnosis Treatment; model cured/total = treatment Diagnosis /solution ddfm = none; estimate 'A-B' treatment , 'A-C' treatment , 'B-C' treatment , 'Comp-Uncomp' Diagnosis 1 -1 / adjust=simulate(nsamp=200000) stepdown(type=logical); run; /* Data from Stokes, M.E., Davis, C.S., and Koch, G.G. (1995), Categorical Data Analysis Using the SAS System, Cary, NC: SAS Institute Inc. */

152 Summary Classic, FWE-controlling MCPs that incorporate
alternative covariance structures and non-normal distributions are easy using PROC GLIMMIX. However, be aware of approximations Plug-in variance/covariance estimates df

153 Further Topics: False Discovery Rate
FDR = E(proportion of rejections that are incorrect) Let R = total # of rejections Let V = # of erroneous rejections FDR = E(V/R) (0/0 defined as 0). FWE = P(V>0)

154 Example 30 independent tests:
20 null hypotheses are true with pj~U(0,1) 10 extremely alternative with pj = 0. Decision rule: Reject H0j if pj £ 0.05 Then: CERj = P(reject H0j | H0j true ) = 0.05. FWE = P(reject one or more of the 20) = 1-(.95)20 =0.64 FDR = E{V/(V+10)} where V~Bin(20.05) so FDR =

155 Benjamini and Hochberg’s FDR-Controlling Method
Let H(1) ,…,H(k) be the hypotheses corresponding to p(1) £ … £ p(k) If p(k) £ a, reject all H(j) and stop, else continue. If p(k-1) £ (k-1)a/k, reject H(1) … H(k-1) and stop, else continue. If p(1) £ a/k, reject H(1) Adjusted p-values: pA(j)= minj£i (k/i)p(i) .

156 Comparison with Hochberg’s Method
A step-up procedure, like Hochberg’s method adjusted p-values are pA(j)= minj£i (k/i)p(i) . Recall for Hochberg’s method, pA(j)= minj£i (k-i+1)p(i) . FDR adjusted p-values are uniformly smaller since k/i £ k-i+1 B-H FDR method uses Simes’ critical points.

157 Critical Values – FDR vs FWE

158 Comments on FDR Control
Considered better for large numbers of tests since FWE is inconsistent Is adaptive Has a loose Bayesian correspondence Easy to misinterpret the results: Given 10 FDR<.10 rejections in a given study, it is tempting to claim that only one can be in error (in an “average” sense). However, this is incorrect, as E(V/R | R>0) > a.

159 Further Topics: Bayesian Methods
Simultaneous Credible Intervals Probabilities of ranking Loss function approach Posterior probabilities of null hypotheses

160 Bayes/Frequentist Comparisons

161 Simultaneous Credible Intervals
Create intervals Ii for qi, so that P(qi Î Ii, all i | Data) = .95 Implementation in Westfall et al (1999) assumes Variance components model (includes regular GLM and heteroscedastic GLM as special case) Jeffreys’ priors on variances (vague) Flat prior on means (also vague) Uses PROC MIXED to obtain sample (assume i.i.d) from posterior distribution Uses %BayesIntervals to obtain simultaneous credible intervals

162 Bayesian Simultaneous Conf. Band
Obs _NAME_ Lower Upper 1 diff diff diff diff diff diff diff diff diff diff diff diff diff proc mixed data=research.tire; class make; model cost = make make*mph/noint solution; prior / out=sample seed= nsample=100000; run; quit; %macro diffs; %do mph = 10 %to 70 %by 5; diff&mph = (beta1-beta2) + (&mph)*(beta3-beta4); %end; %mend; %macro names; diff&mph %end; %mend; data post; set sample; %diffs; run; %BayesIntervals(data=post,vars=%names);

163 Bayes/Frequentist Correspondence
From Westfall, P.H. (2005). Comment on Benjamini and Yekutieli, ‘False Discovery Rate Adjusted Confidence Intervals for Selected Parameters,’ Journal of the American Statistical Association 100,

164 Bayesian Probabilities for Rankings
Suppose you observe Ave1 > Ave2 > … > Avek. What is the probability that m1 > m2 > … > mk ? Bayesian Solution: Calculate proportion of posterior samples for which the ranking holds.

165 Results: Comparing Formulations
Solution for Fixed Effects Standard Effect formulation Estimate Error formulation A formulation B formulation C formulation D formulation E The MEANS Procedure Variable N Mean ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ rank_observed_means Mean5_best Mean1_best Mean2_best Mean3_best E-6 Mean4_best proc mixed data=study; /* see previous MCB example */ class formulation; model efficacy = formulation/noint solution; prior / out=sample seed= nsample=400000; run; data comp; set sample; rank_observed_means = (beta5>beta1)*(beta1>beta2)*(beta2>beta3)*(beta3>beta4); Mean5_best = (beta5 > max(of beta1-beta4)); Mean1_best = (beta1 > max(of beta2-beta5)); Mean2_best = (beta2 > max(of beta1, beta3, beta4, beta5)); Mean3_best = (beta3 > max(of beta1, beta2, beta4, beta5)); Mean4_best = (beta4 > max(of beta1, beta2, beta3, beta5)); proc means n mean ; var rank_observed_means Mean5_best Mean1_best Mean2_best Mean3_best Mean4_best;

166 Waller-Duncan Loss Function Approach
Let dij = mi-mj. Let Li<j(dij) denote the loss of declaring mi<mj. Let Li>j(dij) denote the loss of declaring mi>mj. Let Li~j(dij) denote the loss of declaring mi n.s. different from mj. W-D Loss functions* Li~j(dij) = |dij| Li<j(dij) = 0, dij<0, = kdij otherwise Li>j(dij) = -kdij, dij<0, = 0 otherwise See Pennello, G The k-ratio multiple comparisons Bayes rule for the balanced two-way design. Journal of the American Statistical Association 92: * Equivalent form; See Hochberg and Tamhane (1987, )

167

168 Implementation Waller – Duncan in PROC GLM
More general: Simulate from posterior pdf of the mij, calculate all three losses, average, and choose decision with smallest average loss.

169 Sample Output Decision: m1 > m3 Decision: m1 ~ m5
The MEANS Procedure Variable N Mean Std Error ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Loss1LT Loss1NS Loss1GT Decision: m1 > m3 The MEANS Procedure Variable N Mean Std Error ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Loss1LT Loss1NS Loss1GT %let k = 100; /* W-D k-ratio */ %macro computeLoss; data s; set sample; %do i = 1 %to 5; %do j = %eval(&i+1) %to 5; delta&i.&j = beta&i-beta&j; Loss&i.NS&j = abs(delta&i.&j); Loss&i.LT&j = &k * delta&i.&j *(delta&i.&j >0); Loss&i.GT&j = -&k * delta&i.&j *(delta&i.&j <0); %end; %end; run; proc means n mean stderr; title "Comparison of Mean&i with Mean&j"; var Loss&i.LT&j Loss&i.NS&j Loss&i.GT&j ; run; %end; %end; %mend; %computeLoss; Decision: m1 ~ m5

170 Bayesian Multiple Testing
Frequentist univariate testing: Calculate p-value = P(data more extreme | H0) Bayesian univariate testing: Calculate P(H0 is true | Data) Frequentist multiple testing: if H01, H02, … , H0k are all true (or if many are true) then we get a small p-value by chance alone. use a more conservative rule. Bayesian multiple testing: Express the doubt about many or all H0i being true using prior distribution; use this to calculate posterior probabilities P(H0i is true | Data).

171 Bayesian Multiple Testing: Methodology
Find posterior probability for each of the 2k models where qi is either =0 or ¹0. Then P(qi = 0| Z) = (Sum of posterior probs for all 2k-1 models where qi = 0) (Sum of posterior probs for all 2k models) Gopalan, R., and Berry, D.A. (1998), Bayesian Multiple Comparisons Using Dirichlet Process Priors, Journal of the American Statistical Association 93, Gönen, M., Westfall, P.H. and Johnson, W.O. (2003). Bayesian multiple testing for two-sample multivariate endpoints," Biometrics 59,

172 The %BayesTests Macro: Priors
You can specify your level of prior doubt about individual hypotheses. You can specify either (i) P(H0i is true) or (ii) P(H0i is true, all i) , or both. You can specify (iii) the degree of prior correlation among the individual hypotheses. Specify two out of three of (i), (ii), and (iii). The third is determined by the other two. Specify prior expected effect sizes and prior variances of effect sizes (default: mean effect size is 2.5, variance= 2.)

173 The %BayesTests Macro: Data Assumptions, Inputs, and Outputs
Assume: tests are free combinations (e.g.,multiple endpoints); MANOVA; Large Samples. Inputs: t-statistics and their (conditional) large-sample correlation matrix (this is the partial correlation matrix in the case of multiple endpoints); priors. Outputs: Posterior probabilities P(H0i is true | Data).

174 %BayesTests Example: Multiple Endpoints in Panic Disorder Study
proc glm data=research.panic; class TX; model AASEVO PANTOTO PASEVO PHCGIMPO = TX; estimate "Treatment vs Control" TX 1 -1; manova h=TX / printe; ods output Estimates =Estimates PartialCorr=PartialCorr; run; %macro Estimates; use Estimates; read all var {tValue} into EstPar; use PartialCorr; read all var {AASEVO, PANTOTO, PASEVO, PHCGIMPO} into cov; %mend; %BayesTests(rho=.5,Pi0 =.5);

175 Output from %BayesTests

176 The Effect of Prior Correlation: Borrowing Strength

177 The Bayesian Multiplicity Effect
If the multiple comparisons concern, “What if many or all nulls are true” is valid, the Bayesian must attach a higher probability to P(H0i is true, all i). Here is the result of setting P(H0i is true, all i) = .5. This was the “right answer”!!! “Right” answers, See Westfall, P.H., Krishen,A. and Young, S.S.(1998). "Using Prior Information to Allocate Significance Levels for Multiple Endpoints," Statistics in Medicine 17,

178 Summary: Bayesian Methods
Several Bayesian MCPs are available! Intervals Tests Rankings Decision theory Other current research: FDR – Bayesian connection (genetics) Mixture models and Bayesian MCPs (variable selection)

179 Discussion Good methods and software are available
You can’t use the excuse “I don’t have to use MCPs because there is no good method available” This brings us back to the $100,000,000 question: “When should we use MCPs/MTPs”?

180 When Should You Adjust? A Scientific View
When there is substantial doubt concerning the collection of hypotheses tested When you data snoop When you play “pick the winner” When conclusions require joint validity Doubt exists. Otherwise why do the trial?

181 But What “Family” Should I Use?
The set over which you play “pick the winner” The set of conclusions requiring joint validity Not always well-defined Better to decide at design stage or simply to “frame the discussion”

182 Multiplicity Invites Selection; Selection has an Effect
Variability, probability theory, VERY relevant.

183 Final Words: a/k

184 References: Books Hochberg, Y. and Tamhane, A.C. (1987). Multiple Comparison Procedures. John Wiley, New York. Hsu, J.C. (1996). Multiple Comparisons: Theory and Methods, Chapman and Hall, London. Westfall, P.H., and Young, S.S. (1993) Resampling-Based Multiple Testing: Examples and Methods for P-Value Adjustment. Wiley, New York. Westfall, P.H., Tobias, R.D., Rom, D., Wolfinger, R.D., and Hochberg, Y. (1999). Multiple Comparisons and Multiple Tests Using the SAS® System, Cary, NC: SAS Institute Inc. Westfall, P.H. and Tobias, R. (2000). Exercises to Accompany "Multiple Comparisons and Multiple Tests Using the SAS® System" , Cary, NC: SAS Institute Inc.

185 References: Journal Articles
Bauer, P.; George Chi; Nancy Geller; A. Lawrence Gould; David Jordan; Surya Mohanty; Robert O'Neill; Peter H. Westfall (2003). Industry, Government, and Academic Panel Discussion on Multiple Comparisons in a “Real” Phase Three Clinical Trial. Journal of Biopharmaceutical Statistics, 13(4), Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A new and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57, Berger, J. O. and Delampady, M. (1987), Testing precise hypothesis. Statistical Science 2, Cook, R.J. and Farewell, V.T.(1996). Multiplicity considerations in the design and analysis of clinical trials. JRSS-A 159, Dmitrienko, A, Offen, W. and Westfall, P. (2003). Gatekeeping strategies for clinical trials that do not require all primary effects to be significant. Statistics in Medicine 22, Gönen, M., Westfall, P.H. and Johnson, W.O. (2003). "Bayesian multiple testing for two-sample multivariate endpoints," Biometrics 59, Hellmich M, Lehmacher W. Closure procedures for monotone bi-factorial dose-response designs. Biometrics 2005;61: Koyama, T., and Westfall, P.H. (2005). Decision-Theoretic Views on Simultaneous Testing of Superiority and Noninferiority, Journal of Biopharmaceutical Statistics 15, Lehmacher W., Wassmer G., Reitmeir P.: Procedures for Two-Sample Comparisons with Multiple Endpoints Controlling the Experimentwise Error Rate. Biometrics, 1991, 47: Marcus, R., Peritz, E. and Gabriel, K.R. (1976). On closed testing procedures with special reference to ordered analysis of variance. Biometrika 63, Shaffer, J.P. (1986). Modified sequentially rejective multiple test procedures. Journal of the American Statistical Association 81, 826—831. Westfall, P.H. (1997). "Multiple Testing of General Contrasts Using Logical Constraints and Correlations," Journal of the American Statistical Association 92, Westfall, P.H. and Wolfinger, R.D.(1997). "Multiple Tests with Discrete Distributions," The American Statistician 51, 3-8. Westfall, P.H., Johnson, W.O., and Utts, J.M. (1997). A Bayesian perspective on the Bonferroni adjustment. Biometrika 84, Westfall,P.H. and Wolfinger, R.D. (2000). "Closed Multiple Testing Procedures and PROC MULTTEST." SAS Observations, July, 2000. Westfall, P.H., Ho, S.-Y., and Prillaman, B.A. (2001). "Properties of multiple intersection-union tests for multiple endpoints in combination therapy trials," Journal of Biopharmaceutical Statistics 11, Westfall, P.H. and Krishen, A. (2001). "Optimally weighted, fixed sequence, and gatekeeping multiple testing procedures," Journal of Statistical Planning and Inference 99, Westfall, P. and Bretz, F. (2003). Multiplicity in Clinical Trials. Encyclopedia of Biopharmaceutical Statistics, second edition, Shein-Chung Chow, ed., Marcel Decker Inc., New York, pp Westfall, P.H., Zaykin, D.V., and Young, S.S. (2001). Multiple tests for genetic effects in association studies. Methods in Molecular Biology, vol. 184: Biostatistical Methods, pp Stephen Looney, Ed., Humana Press, Toloway, NJ. Westfall, P.H. and Tobias, R.D. (2007). Multiple Testing of General Contrasts: Truncated Closure and the Extended Shaffer-Royen Method, Journal of the American Statistical Association 102:


Download ppt "A Course in Multiple Comparisons and Multiple Tests"

Similar presentations


Ads by Google