Biostatistics Case Studies 2007

Slides:



Advertisements
Similar presentations
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Advertisements

Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 1: Study Design for Demonstrating Lack of Treatment.
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
+ Chapter 10 Section 10.4 Part 2 – Inference as Decision.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 5: Reporting Subgroup Results.
Confidence Intervals © Scott Evans, Ph.D..
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Lecture 2: Thu, Jan 16 Hypothesis Testing – Introduction (Ch 11)
1 Equivalence and Bioequivalence: Frequentist and Bayesian views on sample size Mike Campbell ScHARR CHEBS FOCUS fortnight 1/04/03.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Sample size calculations
Sample Size Determination
Sample Size Determination Ziad Taib March 7, 2014.
Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Biostatistics in Clinical Research Peter D. Christenson Biostatistician January 12, 2005IMSD U*STAR RISE.
Biostatistics for Coordinators Peter D. Christenson REI and GCRC Biostatistician GCRC Lecture Series: Strategies for Successful Clinical Trials Session.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 2: Sample Size & Power for Inequality and Equivalence Studies.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Challenges of Non-Inferiority Trial Designs R. Sridhara, Ph.D.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Biostatistics Case Studies 2015 Youngju Pak, PhD. Biostatistician Session 1: Sample Size & Power for Inequality and Equivalence Studies.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Issues concerning the interpretation of statistical significance tests.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Power & Sample Size Dr. Andrea Benedetti. Plan  Review of hypothesis testing  Power and sample size Basic concepts Formulae for common study designs.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
Hypothesis Testing. “Not Guilty” In criminal proceedings in U.S. courts the defendant is presumed innocent until proven guilty and the prosecutor must.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 3: Testing Hypotheses.
Biostatistics Case Studies 2006 Peter D. Christenson Biostatistician Session 1: Demonstrating Equivalence of Active Treatments:
Biostatistics Case Studies 2016 Youngju Pak, PhD. Biostatistician Session 2 Understanding Equivalence and Noninferiority testing.
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Inference for a Single Population Proportion (p)
6. Statistical Inference: Significance Tests
Sample Size Determination
The Importance of Adequately Powered Studies
Unit 5: Hypothesis Testing
CHAPTER 9 Testing a Claim
Confidence Intervals and p-values
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Sample Size Estimation
CHAPTER 9 Testing a Claim
Hypothesis Testing: Hypotheses
Hypothesis Testing Summer 2017 Summer Institutes.
P-value Approach for Test Conclusion
Chapter 9 Hypothesis Testing.
CHAPTER 9 Testing a Claim
Statistical Inference
CHAPTER 9 Testing a Claim
Elements of a statistical test Statistical null hypotheses
Psych 231: Research Methods in Psychology
Chapter 9: Testing a Claim
CHAPTER 9 Testing a Claim
Single Pivotal vs Two Replicated Studies. Zijiang Yang
CHAPTER 16: Inference in Practice
AP STATISTICS LESSON 10 – 4 (DAY 2)
CHAPTER 9 Testing a Claim
Type I and Type II Errors
CHAPTER 9 Testing a Claim
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Presentation transcript:

Biostatistics Case Studies 2007 Session 5: Demonstrating Lack of Treatment Effect: Equivalence or Non-inferiority Peter D. Christenson Biostatistician http://gcrc.labiomed.org/Biostat

Terminology Superiority and/or Inferiority Study: Two or more treatments are assumed equal and the study is designed to find overwhelming evidence of a difference. Usually, one treatment is a control, sham, or placebo. Most common comparative study type. It is rare to assess only one of superiority or inferiority (“one-sided” statistical tests), unless there is biological impossibility of one of them.

Terminology Equivalence Study: Two treatments are assumed to differ and the study is designed to find overwhelming evidence that they are equal. Usually, the quantity of interest is a measure of biological activity or potency and “treatments” are drugs or lots or batches of drugs. AKA, bioequivalence. Sometimes used to compare clinical outcomes for two active treatments, e.g., statins or vaccines, if neither treatment can be considered standard or accepted. This usually requires large numbers of subjects

Terminology Non-Inferiority Study: Usually a new treatment or regimen is compared with an accepted treatment or regimen or standard of care. The new treatment is assumed inferior to the standard and the study is designed to show overwhelming evidence that it is at least nearly as good, i.e., non- inferior. It may has other advantages, e.g., oral vs. inj. A negative inferiority study fails to detect inferiority, but does not necessarily give evidence for non-inferiority. The accepted treatment is usually known to be efficacious already, but an added placebo group may also be used. The distinguishing feature is an attempt to prove negativity, not the one-sidedness of the inference.

Case Study

pASA+PPI = 1.5% Demonstrate: pclop – pASA+PPI ≤ 4% N=145/group Power=80% for what?

Typical Analysis: Inferiority or Superiority [Not used in this paper] H0: pclop – pASA+PPI = 0% H1: pclop – pASA+PPI ≠ 0% H1 → therapies differ α = 0.05 Power = 80% for Δ=|pclop - pASA+PPI| =? = 95% CI for pclop – pASA+PPI Clop inferior pclop – pASA+PPI Clop superior pclop – pASA+PPI No diff detected* pclop – pASA+PPI * and 80% chance that a Δ of (?) or more would be detected.

Typical Analysis: Inferiority or Superiority [Not used in this paper] H0: pclop – pASA+PPI = 0% H1: pclop – pASA+PPI ≠ 0% H1 → therapies differ α = 0.05 Power = 80% for Δ=|pclop - pASA+PPI| =? Detectable Δ = 5.5%-1.5%=4% So, N=331/group → 80% chance that a Δ of 4% or more would be detected.

Typical Analysis: Inferiority or Superiority [Not used in this paper] H0: pclop – pASA+PPI = 0% H1: pclop – pASA+PPI ≠ 0% H1 → therapies differ α = 0.05 Power = 80% for Δ=|pclop - pASA+PPI| =4% Note that this could be formulated as two one-sided tests (TOST): H0: pclop – pASA+PPI ≤ 0% H1: pclop – pASA+PPI > 0% H1 → clop inferior α = 0.025 Power = 80% for pclop - pASA+PPI =4% H0: pclop – pASA+PPI ≥ 0% H1: pclop – pASA+PPI < 0% H1 → clop superior α = 0.025 Power = 80% for pclop - pASA+PPI =-4%

Demonstrating Equivalence [Not used in this paper] H0: |pclop – pASA+PPI| ≥ E% H1: |pclop – pASA+PPI| < E% H1 → therapies “equivalent”, within E Note that this could be formulated as two one-sided tests (TOST): H0: pclop – pASA+PPI ≤ -4% H1: pclop – pASA+PPI > -4% H1 → clop non-superior α = 0.025 Power = 80% for pclop - pASA+PPI = 0% H0: pclop – pASA+PPI ≥ 4% H1: pclop – pASA+PPI < 4% H1 → clop non-inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 0%

Demonstrating Equivalence H0: |pclop – pASA+PPI | ≥ 4% H1: |pclop – pASA+PPI | < 4% H1 → equivalence α = 0.05 Power = 80% for pclop - pASA+PPI = 0 = 95% CI for pclop – pASA+PPI pclop – pASA+PPI Clop non-superior -4 4 pclop – pASA+PPI Clop non-inferior -4 4 pclop – pASA+PPI Equivalence* -4 4 * both non-superior and non-inferior.

This Paper: Inferiority and Non-Inferiority Apparently, two one-sided tests (TOST), but only one explicitly powered: H0: pclop – pASA+PPI ≤ 0% H1: pclop – pASA+PPI > 0% H1 → clop inferior α = 0.025 Power = 80% for pclop - pASA+PPI = ?% H0: pclop – pASA+PPI ≥ 4% H1: pclop – pASA+PPI < 4% H1 → clop non-inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 0% The authors chose E=4% as the maximum therapy difference that therapies are considered equivalent.

This Paper: Inferiority and Non-Inferiority = 95% CI for pclop – pASA+PPI Decisions: pclop – pASA+PPI Clop inferior -4 4 pclop – pASA+PPI Clop non-inferior -4 4 “Non-clinical” inferiority* pclop – pASA+PPI -4 4 * clop is statistically inferior, but not enough for clinical significance. Observed Results: pclop = 8.6%; pASA+PPI = 0.7%; 95% CI = 3.4 to 12.4 pclop – pASA+PPI Clop inferior -4 4 12

Power for Test of Clopidrogrel Non-Inferiority H0: pclop – pASA+PPI ≥ 4% H1: pclop – pASA+PPI < 4% H1 → clop non-inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 0%

Power for Test of Clopidrogrel Inferiority H0: pclop – pASA+PPI ≤ 0% H1: pclop – pASA+PPI > 0% H1 → clop inferior α = 0.025 Power = 80% for pclop - pASA+PPI = 7.3% Detectable Δ = 8.8%-1.5%=7.3%

Conclusions: This Paper In this paper, clop was so inferior that investigators were apparently lucky to have enough power for detecting it. The CI was too wide with this N for detecting a smaller therapy difference. Investigators justify testing non-inferiority of clop only (and not of Aspirin + Nexium) with the lessened desirability of combination therapy (?). This is a good approach for size and power for a new competing therapy against a standard, if the N for clop inferiority had been considered also. Note that power calculations were based on actual %s of subjects, whereas cumulative 12-month incidence was used in the analysis. There are not power calculations for equivalency tests using survival analysis, that I know of.

Conclusions: General “Negligibly inferior” would be a better term than non- inferior. All inference can be based on confidence intervals. Pre-specify the comparisons to be made. Cannot test for both non-inferiority and superiority. Power for only one or for multiple comparisons, e.g., non-inferiority and inferiority. Power can be different for different comparisons. Very careful consideration must be given to choice of margin of equivalence (4% here). The study is worthless if others in the field would find your margin too large.

FDA Guidelines http://www.fda.gov/cder/guidance/4155fnl.pdf FDA has at least 4 major concerns: Need strong evidence that standard treatment is effective. Must have acceptable margin of equivalence that is much smaller than the effect of the standard over placebo. Trial design must be very close to that which established the effectiveness of the standard treatment. Study conduct must be high quality. This sounds like business-speak about “excellence”, but it’s really referring to the fact that superiority studies are by nature conservative: e.g., non-compliance and misclassification bias the results toward no effect. Those flaws in a non-inferiority study have the same bias, making it easier to falsely prove the aim.

Appendix: Possible Errors in Study Conclusions Typical study to demonstrate superiority/inferiority Truth: Study Claims: H0: No Effect H1: Effect No Effect Correct Error (Type II) Specificity Sensitivity Effect Error (Type I) Correct Set α=0.05 Specificity=95% Power: Maximize Choose N for 80%

Appendix: Graphical Representation of Power Typical study to demonstrate superiority/inferiority H0: true effect=0 HA: true effect=3 Effect in study=1.13 N=100 per Group Larger Ns give narrower curves 41% HA H0 5% Effect (Group B mean – Group A mean) \\\ = Probability of concluding HA if H0 is true. /// = Probability of concluding H0 if HA is true. Power=100-41=59% Note greater power if larger N, and/or if true effect>3, and/or less subject heterogeneity.

Appendix: Online Study Size / Power Calculator www.stat.uiowa.edu/~rlenth/Power Does NOT include tests for equivalence or non-inferiority or non-superiority