Decision-Theoretic Views on Switching Between Superiority and Non-Inferiority Testing. Peter Westfall Director, Center for Advanced Analytics and Business.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Chapter 7 Hypothesis Testing
“Students” t-test.
STA305 Spring 2014 This started with excerpts from STA2101f13
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses Type I and Type II Errors Type I and Type II Errors.
1 1 Slide STATISTICS FOR BUSINESS AND ECONOMICS Seventh Edition AndersonSweeneyWilliams Slides Prepared by John Loucks © 1999 ITP/South-Western College.
Chapter 9 Hypothesis Testing
1 1 Slide MA4704Gerry Golding Developing Null and Alternative Hypotheses Hypothesis testing can be used to determine whether Hypothesis testing can be.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
9-1 Hypothesis Testing Statistical Hypotheses Statistical hypothesis testing and confidence interval estimation of parameters are the fundamental.
ANOVA Determining Which Means Differ in Single Factor Models Determining Which Means Differ in Single Factor Models.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
1 1 Slide © 2006 Thomson/South-Western Chapter 9 Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses.
Hypothesis Testing for the Mean and Variance of a Population Introduction to Business Statistics, 5e Kvanli/Guynes/Pavur (c)2000 South-Western College.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Chapter 9 Hypothesis Testing.
©2006 Thomson/South-Western 1 Chapter 10 – Hypothesis Testing for the Mean of a Population Slides prepared by Jeff Heyl Lincoln University ©2006 Thomson/South-Western.
CHAPTER 10: Hypothesis Testing, One Population Mean or Proportion
Overview Definition Hypothesis
1 1 Slide © 2005 Thomson/South-Western Chapter 9, Part A Hypothesis Tests Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses.
STT 315 This lecture is based on Chapter 6. Acknowledgement: Author is thankful to Dr. Ashok Sinha, Dr. Jennifer Kaplan and Dr. Parthanil Roy for allowing.
STAT 5372: Experimental Statistics Wayne Woodward Office: Office: 143 Heroy Phone: Phone: (214) URL: URL: faculty.smu.edu/waynew.
Lecture 20: Single Sample Hypothesis Tests: Population Mean and Proportion Devore, Ch
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Adapted by Peter Au, George Brown College McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited.
Inference for a Single Population Proportion (p).
Slide Slide 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim about a Proportion 8-4 Testing a Claim About.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
1 1 Slide IS 310 – Business Statistics IS 310 Business Statistics CSU Long Beach.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Statistical Inference
Hypothesis and Test Procedures A statistical test of hypothesis consist of : 1. The Null hypothesis, 2. The Alternative hypothesis, 3. The test statistic.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
One-way ANOVA: - Comparing the means IPS chapter 12.2 © 2006 W.H. Freeman and Company.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Overview.
Slide Slide 1 Section 8-4 Testing a Claim About a Mean:  Known.
© Copyright McGraw-Hill 2004
Statistical Inference Making decisions regarding the population base on a sample.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Chapter 12 Tests of Hypotheses Means 12.1 Tests of Hypotheses 12.2 Significance of Tests 12.3 Tests concerning Means 12.4 Tests concerning Means(unknown.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
DSCI 346 Yamasaki Lecture 1 Hypothesis Tests for Single Population DSCI 346 Lecture 1 (22 pages)1.
Remaining Challenges in Assessing Non-Inferiority Steven Snapinn DIA Statistics Community Virtual Journal Club December 16, 2014 Based on Paper with Qi.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
1 Basics of Inferential Statistics Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Lucknow, India, March 2010.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. CHAPTER 10: Hypothesis Testing, One Population Mean or Proportion to accompany Introduction.
Inference for a Single Population Proportion (p)
Chapter 10 Hypothesis Testing 1.
untuk mengiringi Introduction to Business Statistics
STAT 312 Chapter 7 - Statistical Intervals Based on a Single Sample
Hypothesis testing using contrasts
Hypothesis Testing: Hypotheses
Chapter 9 Hypothesis Testing.
Sequence comparison: Multiple testing correction
OMGT LECTURE 10: Elements of Hypothesis Testing
Confidence Intervals.
Covering Principle to Address Multiplicity in Hypothesis Testing
Presentation transcript:

Decision-Theoretic Views on Switching Between Superiority and Non-Inferiority Testing. Peter Westfall Director, Center for Advanced Analytics and Business Intelligence Texas Tech University

Background MCP2002 Conference in Bethesda, MD, August 2002 J. Biopharm. Stat. special issue, to appear Articles: –Ng,T.-H. “Issues of simultaneous tests for non-inferiority and superiority” –Comment by G. Pennello –Comment by W. Maurer –Rejoinder by T.-H. Ng

Ng’s Arguments No problem with control of Type I errors in switching from N.I. to Sup. Tests However, it seems “sloppy”: –Loss of power in replication when there are two options –It will allow “too many” drugs to be called “superior” that are not really superior.

Westfall interjects for the next few slides Why does switching allow control of Type I errors? Three views: –Closed Testing –Partitioning Principle –Confidence Intervals

Closed Testing Method(s) Form the closure of the family by including all intersection hypotheses. Test every member of the closed family by a (suitable)  -level test. (Here,  refers to comparison-wise error rate). A hypothesis can be rejected provided that –its corresponding test is significant at level  and –every other hypothesis in the family that implies it is rejected by its  level test. Note: Closed testing is more powerful than (e.g.) Bonferroni.

Control of FWE with Closed Tests Suppose H 0j 1,..., H 0j m all are true (unknown to you which ones). You can reject one or more of these only when you reject the intersection H 0j 1 ...  H 0j m Thus, P(reject at least one of H 0j 1,..., H 0j m | H 0j 1,..., H 0j m all are true)  P(reject H 0j 1 ...  H 0j m | H 0j 1,..., H 0j m all are true) = 

Closed Testing – Multiple Endpoints H 0 :  1 =  2 =  3 =  4 =0 H 0 :  1 =  2 =  3 =0H 0 :  1 =  2 =  4 =0H 0 :  1 =  3 =  4 =0H 0 :  2 =  3 =  4 =0 H 0 :  1 =  2 =0H 0 :  1 =  3 =0 H 0 :  1 =  4 =0 H 0 :  2 =  3 =0 H 0 :  2 =  4 =0 H 0 :  3 =  4 =0 H 0 :  1 =0 p = H 0 :  2 =0 p = H 0 :  3 =0 p = H 0 :  4 =0 p = Where  j = mean difference, treatment -control, endpoint j.

Closed Testing – Superiority and Non-Inferiority     (Null: Inf.; Alt: Non-Inf)    (Null: not sup.; Alt: sup.) Note: The intersection of the non-inferiority hypothesis and the superiority hypothesis is equal to the non-inferiority hypothesis     (Null: Inf.; Alt: Non-Inf) Intersection of the two nulls

Why there is no penalty from the closed testing standpoint Reject     only if –     is rejected, and –     is rejected. (no additional penalty) Reject     only if –     is rejected, and –     is rejected. (no additional penalty) So both can be tested at 0.05; sequence is irrelevant.

Why there is no need for multiplicity adjustment: The Partitioning View Partitioning principle: –Partition the parameter space into disjoint subsets of interest –Test each subset using an  -level test. –Since the parameter may lie in only one subset, no multiplicity adjustment is needed. Benefits –Can (rarely) be more powerful than closure –Confidence set equivalence (invert the tests)

Partitioning Null Sets            You may test both without multiplicity adjustment, since only one can be true. LFC for    is   ; the LFC for    is  Exactly equivalent to closed testing.

Confidence Interval Viewpoint Contruct a 1-  lower confidence bound on , call it d L. If d L > 0, conclude superiority. If d L >    conclude non-inferiority. The testing and interval approaches are essentially equivalent, with possible minor differences where tests and intervals do not coincide (eg, binomial tests).

Back to Ng Ng’s Loss Function Approach Ng does not disagree with the Type I error control. However, he is concerned from a decision-theoretic standpoint So he compares the “Loss” when allowing testing of: –Only one, pre-defined hypothesis –Both hypotheses

Ng’s Example Situation 1: Company tests only one hypothesis, based on their preliminary assessment. Situation 2: Company tests both hypotheses, regardless of preliminary assessment,

Further Development of Ng Out of the “next 2000” products, –1000 are truly equally efficacious as A.C. –1000 are truly superior to A.C. Suppose further that the company either –Makes perfect preliminary assessments, or –Makes correct assessments 80% of the time

Perfect Classification; One Test Only

80% Correct Classification; One Test Only

No Classification; Both Tests Performed Ng’s concern: “Too many” Type I errors.

Westfall’s generalization of Ng Three – decision problem: –Superiority –Non-Inferiority –NS (“Inferiority”) Usual “Test both” strategy: –Claim Sup if 1.96 < Z –Claim NonInf if 1.96 –  0 < Z < 1.96 –Claim NS if Z < 1.96 –  0

Further Development Assume  0 = 3.24 (  90% power to detect non-inf.). True States of Nature –Inferiority:  < –Non-Inf: <  < 0 –Sup: 0 < 

Loss Function 0L12L13 L210L23 L31L320 Inf (  -3.24) NonInf (-3.24 <   0) Sup (0 <  ) Nature Claim NSNonInfSup

Prior Distribution – Normal + Equivalence Spike

Westfall’s Extension Compare –Ng’s recommendation to “preclassify” drugs according to Non-Inf or Sup, and –The “test both” recommendation Use % increase over minimum loss as a criteria. The comparison will depend on prior and loss!

Probability of Selecting “NonInf” Test Probit function, anchors are P(NonInf|  =0) = p s ; P(NonInf|  =3.24) = 1-p s. Ng suggests p s =.80.

Summary of Priors and Losses  ~ p  {I(  =0)} + (1-p)  N(  ; m , s 2 ) (3 parms) P(Select NonInf |  ) =  (a + b  ), where a,b determined by p s (1 parm) (only for Ng) Loss matrix (5 parms) Total: 3 or 4 prior parameters and 5 loss parameters. Not too bad!!!

Baseline Model  ~ (.2)  {I(  =0)} + (.8)  N(  ; 1, 4 2 ) P(Select NonInf |  ) =  (  ) (p s =.8) Loss matrix: (An attempt to quantify “Loss to patient population”) Inf (  -3.24) NonInf (-3.24 <   0) Sup (0 <  ) NS/InfNonInfSup Nature Claim

Consequence of Baseline Model Optimal decisions (standard decision theory; see eg Berger’s book): –Classify to NS when z < –Classify to NonInf when < z < 2.20 –Classify to Sup when 2.20 < z Ordinary rule: Cutpoints are -1.28, 1.96

Loss Matrix – Select and test only the NonInf hypothesis Inf (  -3.24) NonInf (-3.24 <   0) Sup (0 <  ) Z< < Z < <Z Nature Outcome

Loss Matrix – Select and test only the Sup hypothesis Inf (  -3.24) NonInf (-3.24 <   0) Sup (0 <  ) Z< < Z < <Z Nature Outcome

Deviation from Baseline: Effect of p

Deviation from Baseline: Effect of m

Deviation from Baseline: Effect of s

Deviation from Baseline: Effect of Correct Selection, p s

Changing the Loss Function c1c1 090 c  20c  10 0 Inf (  -3.24) NonInf (-3.24 <   0) Sup (0 <  ) NS/InfNonInfSup Nature Claim Multiply lower left by c; c>0

Deviation from Baseline: Effect of c

Conclusions The simultaneous testing procedure is generally more efficient (less loss) than Ng’s method, except: –When Type II errors are not costly –When a large % of products are equivalent A sidelight: The optimal rule itself is worth considering: –Thresholds for Non-Inf are more liberal, which allows a more stringent definition of non- inferiority margin