The changing landscape of interim analyses for efficacy / futility

Slides:



Advertisements
Similar presentations
Interim Analysis in Clinical Trials: A Bayesian Approach in the Regulatory Setting Telba Z. Irony, Ph.D. and Gene Pennello, Ph.D. Division of Biostatistics.
Advertisements

Mentor: Dr. Kathryn Chaloner Iowa Summer Institute in Biostatistics
Phase II/III Design: Case Study
Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
A Flexible Two Stage Design in Active Control Non-inferiority Trials Gang Chen, Yong-Cheng Wang, and George Chi † Division of Biometrics I, CDER, FDA Qing.
1 QOL in oncology clinical trials: Now that we have the data what do we do?
1 Implementing Adaptive Designs in Clinical Trials: Risks and Benefits Christopher Khedouri, Ph.D.*, Thamban Valappil, Ph.D.*, Mohammed Huque, Ph.D.* *
Statistical Analysis for Two-stage Seamless Design with Different Study Endpoints Shein-Chung Chow, Duke U, Durham, NC, USA Qingshu Lu, U of Science and.
Stopping Trials for Futility Ranjit Lall (May 2009)
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
Estimation and Reporting of Heterogeneity of Treatment Effects in Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare.
Impact of Dose Selection Strategies on the Probability of Success in the Phase III Zoran Antonijevic Senior Director Strategic Development, Biostatistics.
Chapter 11: Sequential Clinical Trials Descriptive Exploratory Experimental Describe Find Cause Populations Relationships and Effect Sequential Clinical.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
1Carl-Fredrik Burman, 11 Nov 2008 RSS / MRC / NIHR HTA Futility Meeting Futility stopping Carl-Fredrik Burman, PhD Statistical Science Director AstraZeneca.
BCOR 1020 Business Statistics
Sample Size Determination
Adaptive Designs for Clinical Trials
Sample Size Determination Ziad Taib March 7, 2014.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Qian H. Li, Lawrence Yu, Donald Schuirmann, Stella Machado, Yi Tsong
Pilot Study Design Issues
1 Efficacy Results NDA (MTP-PE) Laura Lu Statistical Reviewer Office of Biostatistics FDA/CDER.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Intervention Studies Principles of Epidemiology Lecture 10 Dona Schneider, PhD, MPH, FACE.
BIOE 301 Lecture Seventeen. Guest Speaker Jay Brollier World Camp Malawi.
Adaptive designs as enabler for personalized medicine
CI - 1 Cure Rate Models and Adjuvant Trial Design for ECOG Melanoma Studies in the Past, Present, and Future Joseph Ibrahim, PhD Harvard School of Public.
Background to Adaptive Design Nigel Stallard Professor of Medical Statistics Director of Health Sciences Research Institute Warwick Medical School
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Study design P.Olliaro Nov04. Study designs: observational vs. experimental studies What happened?  Case-control study What’s happening?  Cross-sectional.
Optimal cost-effective Go-No Go decisions Cong Chen*, Ph.D. Robert A. Beckman, M.D. *Director, Merck & Co., Inc. EFSPI, Basel, June 2010.
DATA MONITORING COMMITTEES: COMMENTS ON PRESENTATIONS Susan S. Ellenberg, Ph.D. Department of Biostatistics and Epidemiology University of Pennsylvania.
How much can we adapt? An EORTC perspective Saskia Litière EORTC - Biostatistician.
Consumer behavior studies1 CONSUMER BEHAVIOR STUDIES STATISTICAL ISSUES Ralph B. D’Agostino, Sr. Boston University Harvard Clinical Research Institute.
Mass BioTech Council DMC Presentation Statistical Considerations Philip Lavin, Ph.D. October 30, 2007.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Topics in Clinical Trials (7) J. Jack Lee, Ph.D. Department of Biostatistics University of Texas M. D. Anderson Cancer Center.
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
Overview of Monitoring Clinical Trials Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington August 5, 2006 Scott S. Emerson,
1 An Interim Monitoring Approach for a Small Sample Size Incidence Density Problem By: Shane Rosanbalm Co-author: Dennis Wallace.
1 Statistics in Drug Development Mark Rothmann, Ph. D.* Division of Biometrics I Food and Drug Administration * The views expressed here are those of the.
Use of Candidate Predictive Biomarkers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
What is a non-inferiority trial, and what particular challenges do such trials present? Andrew Nunn MRC Clinical Trials Unit 20th February 2012.
Cancer Trials. Reading instructions 6.1: Introduction 6.2: General Considerations - read 6.3: Single stage phase I designs - read 6.4: Two stage phase.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
Efficient Designs for Phase II and Phase III Trials Jim Paul CRUK Clinical Trials Unit Glasgow.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
BIOE 301 Lecture Seventeen. Progression of Heart Disease High Blood Pressure High Cholesterol Levels Atherosclerosis Ischemia Heart Attack Heart Failure.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Department Author Bayesian Sample Size Determination in the Real World John Stevens AstraZeneca R&D Charnwood Tony O’Hagan University of Sheffield.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
1 Interim Analysis in Clinical Trials Professor Bikas K Sinha [ ISI, KolkatA ] RU Workshop : April18,
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Some Design Issues in Microbicide Trials August 20, 2003 Thomas R. Fleming, Ph.D. Professor and Chair of Biostatistics University of Washington FDA Antiviral.
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 51 Lecture 5 Adaptive designs 5.1Introduction 5.2Fisher’s combination method 5.3The inverse normal method.
Session 6: Other Analysis Issues In this session, we consider various analysis issues that occur in practice: Incomplete Data: –Subjects drop-out, do not.
1 Pulminiq™ Cyclosporine Inhalation Solution Pulmonary Drug Advisory Committee Meeting June 6, 2005 Statistical Evaluation Statistical Evaluation Jyoti.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
The Importance of Adequately Powered Studies
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Donald E. Cutlip, MD Beth Israel Deaconess Medical Center
Strategies for Implementing Flexible Clinical Trials Jerald S. Schindler, Dr.P.H. Cytel Pharmaceutical Research Services 2006 FDA/Industry Statistics Workshop.
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Data Monitoring committees and adaptive decision-making
Statistical considerations for the Nipah virus treatment study
Tobias Mielke QS Consulting Janssen Pharmaceuticals
Optimal Basket Designs for Efficacy Screening with Cherry-Picking
Medical Statistics Exam Technique and Coaching, Part 2 Richard Kay Statistical Consultant RK Statistics Ltd 22/09/2019.
Presentation transcript:

The changing landscape of interim analyses for efficacy / futility Stat 208: Statistical Thinking The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium marc.buyse@iddi.com Massachusetts Biotechnology Council Cambridge, Mass June 2, 2009 Chapter 10

Reasons for Interim Analyses Stat 208: Statistical Thinking Early stopping for safety extreme efficacy futility Adaptation of design based on observed data to play the winner / drop the loser maintain power make any adaptation, for whatever reason and whether or not data-derived, whilst controlling for  Chapter 10

Methods for Interim Analyses Stat 208: Statistical Thinking Multi-stage designs / seamless transition designs Group-sequential designs Stochastic curtailment Sample size adjustments Adaptive (« flexible ») designs Chapter 10

Early Stopping Helsinki Declaration: “Physician should cease any investigation if the hazards are found to outweigh the potential benefits.” (« Primum non nocere ») Trials with serious, irreversible endpoints should be stopped if one treatment is “proven” to be superior, and such potential stopping should be formally pre-specified in the trial design.

The Cost of Delay « Blockbusters » reach sales > 500 M$ a year (> 1 M$ a day)

Fixed Sample Size Trials… Stat 208: Statistical Thinking Fixed Sample Size Trials… 1 – the sample size is calculated to detect a given difference at given significance and power 2 – the required number of patients is accrued 3 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified number of events Chapter 10

…vs (Group) Sequential Trials… Stat 208: Statistical Thinking …vs (Group) Sequential Trials… 1 – the sample size is calculated to detect a given difference at given significance and power 2 – patients are accrued until a pre-planned interim analysis of patient outcomes takes place 3a – the trial is terminated early, or 3b – the trial continues unchanged 4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified number of events Chapter 10

Stat 208: Statistical Thinking …vs Adaptive Trials 1 – the sample size is calculated to detect a given difference at given significance and power 2 – patients are accrued until a pre-planned interim analysis of patient outcomes takes place 3a – the trial is terminated early, or 3b – the trial continues unchanged, or 3c – the trial continues with adaptations 4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified or modified number of events Chapter 10

Randomized phase II trial with continuation as phase III trial Simultaneous screening of several treatment groups with continuation as phase III trial : PHASE II PHASE III Arm 1 Arm 2 Arm 3 Early stopping of one or more arms Comparison of the arms

Phase III trial with interim analysis Phase III trial with interim look at data: PHASE III INTERIM PHASE III Arm 1 Arm 2 Arm 3 Interim comparison of the arms Comparison of the arms

Seamless transition designs (e.g. for dose selection) Designs can be operationally or inferentially seamless:

Group Sequential Trials If several analyses are carried out, the Type I error is inflated if each analysis is carried out at the target level of significance. So, the interim analyses must use an adjusted level of significance so as to preserve the overall type I error.

Inflation of  with multiple analyses With 5 analyses performed at level 0.05, the overall level is 0.15

Adjusting  for multiple analyses The 5 analyses must be performed at level 0.0159 in order to preserve an overall level of 0.05

Group sequential designs Test H0: Δ = 0 vs. HA: Δ ≠ 0 m pts. accrued to each arm between analyses Use standardized test statistic Zk, k=1,...,K

Group-Sequential Designs – Type I Error Probability of wrongly stopping/rejecting H0 at analysis k PH0(|Z1|< c1, ..., |Zk-1|< ck-1, | Zk |≥ ck) = πk “Type I error spent at stage k” P(Type I error) = ∑πk Choose ck’s so that ∑πk  α

Group-Sequential Designs – Type II Error Probability of Type II error is 1-PHA( U {|Z1|<c1, ..., |Zk-1|<ck-1, | Zk |≥ck} ) Depends on K, α, β, ck’s. Given the values, the required sample size can be computed it can be expressed as R x (fixed sample size)

Pocock Boundaries Reject H0 if | Zk | > cP(K,α) cP(K,α) chosen so that P(Type I error) = α All analyses are carried out at the same adjusted significance level The probability of early rejection is high but the power at the final analysis may be compromised

Pocock Boundaries p-values for Zk (two-sided) per interim analysis (K=5)

O’Brien-Fleming Boundaries Reject H0 if | Zk | > cOBF(K,α)√(K / k) for k=K we get | ZK | > cOBF(K,α) cOBF(K,α) chosen so that P(Type I error) = α Early analyses are carried out at extreme adjusted significance levels The probability of early rejection is low but the power at the final analysis is almost unaffected

O’Brien-Fleming Boundaries p-values for Zk (two-sided) per interim analysis (K=5)

Wang & Tsiatis Boundaries Reject H0 if | Zk | > cWT(K,α,θ)(K / k)θ - ½ θ = 0.5 gives Pocock’s test; θ = 0, O’Brien-Fleming implemented in some software (e.g. EaSt) Can accomodate any intermediate choice between Pocock and O’Brien-Fleming

Wang & Tsiatis Boundaries p-values for Zk (two-sided) per interim analysis (K=5) with  = .2

Haybittle & Peto Boundaries Reject H0 if | Zk | > 3 for k = 1,...,K-1 Reject H0 if | Zk | > cHP(K,α) for k = K | Zk | > 3 corresponds to using p < 0.0026 Early analyses are carried out at extreme, yet reasonable adjusted significance levels Intuitive and easily implemented if correction to final significance level is ignored (pragmatic approach)

Haybittle & Peto Boundaries p-values for Zk (two-sided) per interim analysis (K=5)

Boundaries compared p-values for Zk (two-sided) per interim analysis (K=5)

Boundaries compared Zk per interim analysis (K=5)

Potential savings / costs in using group sequential designs Expected sample sizes for different designs (K=5): - outcomes normally distributed with  = 2 -  = 0.05 -  = 0.1 for A - B = 1 A - B  Fixed sample Pocock O’Brien-Fleming 0.0 170 205 179 0.5 182 168 1.0 117 130 1.5 70 94

Error-Spending Approach Removing the requirement of a fixed number of equally- spaced analyses Lan & DeMets (1983): two-sided tests “spending” Type I error. Maximum information design: Error spending function → Defines boundaries Accept H0 if Imax attained without rejecting the null

Error-Spending Approach f(t)=min(2-2Φ(z1-α/2),α) yields ≈ O’B-F boundaries f(t)=min(α ln (1+(e -1)t,α) yields ≈ Pocock boundaries f(t)=min(αtθ,α): θ=1 or 3 corresponds to Pocock and O’B-F, respectively

How Many Interim Analyses? One or two interim analyses give most benefit in terms of a reduction of the expected sample size Not much gain from going beyond 5 analyses

When to Conduct Interim Analyses? With error-spending, full flexibility as to number and timing of analyses First analysis should not be “too early” (often at  50% of information time) Equally-spaced analyses advisable In principle, strategy/timing should not be chosen based on the observed results

Who conducts interim analyses? Independent Data Monitoring Committee Experts from different disciplines (clinicians, statisticians, ethicists, patient advocates, …) Reviews trial conduct, safety and efficacy data Recommends Stopping the trial Continuing the trial unchanged Amending the trial

Sample Size Re-Estimation Stat 208: Statistical Thinking Sample Size Re-Estimation Assume normally distributed endpoints Sample size depends on σ2 If misspecified, nI can be too small Idea: internal pilot study estimate σ2 based on early observed data compute new sample size, nA if necessary, accrue extra patients above nI Chapter 10

Early Stopping for Futility Stopping to reject H0 of no treatment difference Avoids exposing further patients to the inferior treatment Appropriate if no further checks are needed on, e.g., treatment safety or long-term effects. Stopping to accept H0 of no treatment difference Stopping “for futility” or “abandoning a lost cause” Saves time and effort when a study is unlikely to lead to a positive conclusion.

Two-Sided Test

Stochastic Curtailment Idea: Terminate the trial for efficacy if there is high probability of rejecting the null, given the current data and assuming the null is true among future patients Conversely, terminate the trial for futility if there is low probability of rejecting the null, given the current data and assuming the alternative is true among future patients

Conditional Power At the interim analysis k, define pk(Δ) = PHA(Test will reject H0 | current data) A high value of pk(0) suggests T will reject H0 terminate the trial & reject H0 if pk(0) > ξ terminate the trial & accept H0 if 1-pk(Δ) > ξ’ (1-sided) probabilities of error, type I  α / ξ, type II  β / ξ’ Note: ξ and ξ’  0.8

Conditional Power Unconditional power for α=0.05 and β=0.1 at Δ=0.2 Conditional power for a mid-trial analysis with an estimate of Δ of 0.1 probability of rejecting the null at the end of the trial has been reduced from 0.9 to 0.1

Conditional Power B(t) = Z(t)t1/2 = t

Slope = assumed treatment effect in future patients Conditional Power Slope = assumed treatment effect in future patients

Crosshatched area = conditional power

Predictive Power π(Δ | data) is the posterior density Problem with the conditional power approach: it is computed assuming Δ not supported by the current data. A solution: average across the values of Δ “Predictive power” π(Δ | data) is the posterior density Termination against H0 if Pk > ξ etc. What prior ?

Futility guidelines Less indicated More indicated Controversial intervention requiring large randomized evidence (e.g. drug eluding stents) Time to event endpoints with rapid enrollment (e.g. cholesterol lowering drugs) Intervention in current use Learning curve by investigators (e.g. mechanical heart valves) Late effects suspected Safety expected to be an issue (e.g. cox-2 inhibitors) Approved competitive products (e.g. drugs for allergic rhinitis) Long pipeline of alternative drugs (e.g. oncology) Short-term outcomes (e.g. 30 day mortality in sepsis)

Overruling futility boundaries No stopping when boundary crossed Stopping when boundary not crossed Time trends Baseline imbalances Major problems with quality of data Considerable imputation of missing data Important secondary endpoints showing benefit External information on benefit t of similar therapies Benefit/risk ratio unlikely to be good enough to adopt experimental treatment All endpoints showing consistent trends against experimental treatment External information on lack of effect of similar therapies

Adaptive Designs Based on combining p-values from different analyses Allow for flexible designs sample size re-calculation any changes to the design (including endpoint, test, etc!)

Adaptive Designs L = k-1/2∑Φ-1(1-pk) Lehmacher and Wassmer (1999): At stage k, combine one-sided p-values p1,... ,pk L = k-1/2∑Φ-1(1-pk) Use any group sequential design for L Slight power loss as compared to a group-sequential plan Flexibility as to design modifications: OK for control of type I error, BUT…

Potential concerns with adaptive designs Major changes between cohorts make clinical interpretation difficult If eligibility / endpoint changed, what is adequate label? Temporal trends Operational bias Less efficient than group sequential for sample size adjustments Modest gains (in general), high risks