Presentation is loading. Please wait.

Presentation is loading. Please wait.

The changing landscape of interim analyses for efficacy / futility

Similar presentations


Presentation on theme: "The changing landscape of interim analyses for efficacy / futility"— Presentation transcript:

1 The changing landscape of interim analyses for efficacy / futility
Stat 208: Statistical Thinking The changing landscape of interim analyses for efficacy / futility Marc Buyse, ScD IDDI, Louvain-la-Neuve, Belgium Massachusetts Biotechnology Council Cambridge, Mass June 2, 2009 Chapter 10

2 Reasons for Interim Analyses
Stat 208: Statistical Thinking Early stopping for safety extreme efficacy futility Adaptation of design based on observed data to play the winner / drop the loser maintain power make any adaptation, for whatever reason and whether or not data-derived, whilst controlling for  Chapter 10

3 Methods for Interim Analyses
Stat 208: Statistical Thinking Multi-stage designs / seamless transition designs Group-sequential designs Stochastic curtailment Sample size adjustments Adaptive (« flexible ») designs Chapter 10

4 Early Stopping Helsinki Declaration:
“Physician should cease any investigation if the hazards are found to outweigh the potential benefits.” (« Primum non nocere ») Trials with serious, irreversible endpoints should be stopped if one treatment is “proven” to be superior, and such potential stopping should be formally pre-specified in the trial design.

5 The Cost of Delay « Blockbusters » reach sales > 500 M$ a year (> 1 M$ a day)

6 Fixed Sample Size Trials…
Stat 208: Statistical Thinking Fixed Sample Size Trials… 1 – the sample size is calculated to detect a given difference at given significance and power 2 – the required number of patients is accrued 3 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified number of events Chapter 10

7 …vs (Group) Sequential Trials…
Stat 208: Statistical Thinking …vs (Group) Sequential Trials… 1 – the sample size is calculated to detect a given difference at given significance and power 2 – patients are accrued until a pre-planned interim analysis of patient outcomes takes place 3a – the trial is terminated early, or 3b – the trial continues unchanged 4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified number of events Chapter 10

8 Stat 208: Statistical Thinking
…vs Adaptive Trials 1 – the sample size is calculated to detect a given difference at given significance and power 2 – patients are accrued until a pre-planned interim analysis of patient outcomes takes place 3a – the trial is terminated early, or 3b – the trial continues unchanged, or 3c – the trial continues with adaptations 4 – patient outcomes are analyzed at the end of the trial, after observation of the pre-specified or modified number of events Chapter 10

9 Randomized phase II trial with continuation as phase III trial
Simultaneous screening of several treatment groups with continuation as phase III trial : PHASE II PHASE III Arm 1 Arm 2 Arm 3 Early stopping of one or more arms Comparison of the arms

10 Phase III trial with interim analysis
Phase III trial with interim look at data: PHASE III INTERIM PHASE III Arm 1 Arm 2 Arm 3 Interim comparison of the arms Comparison of the arms

11 Seamless transition designs (e.g. for dose selection)
Designs can be operationally or inferentially seamless:

12 Group Sequential Trials
If several analyses are carried out, the Type I error is inflated if each analysis is carried out at the target level of significance. So, the interim analyses must use an adjusted level of significance so as to preserve the overall type I error.

13 Inflation of  with multiple analyses
With 5 analyses performed at level 0.05, the overall level is 0.15

14 Adjusting  for multiple analyses
The 5 analyses must be performed at level in order to preserve an overall level of 0.05

15 Group sequential designs
Test H0: Δ = 0 vs. HA: Δ ≠ 0 m pts. accrued to each arm between analyses Use standardized test statistic Zk, k=1,...,K

16 Group-Sequential Designs – Type I Error
Probability of wrongly stopping/rejecting H0 at analysis k PH0(|Z1|< c1, ..., |Zk-1|< ck-1, | Zk |≥ ck) = πk “Type I error spent at stage k” P(Type I error) = ∑πk Choose ck’s so that ∑πk  α

17 Group-Sequential Designs – Type II Error
Probability of Type II error is 1-PHA( U {|Z1|<c1, ..., |Zk-1|<ck-1, | Zk |≥ck} ) Depends on K, α, β, ck’s. Given the values, the required sample size can be computed it can be expressed as R x (fixed sample size)

18 Pocock Boundaries Reject H0 if | Zk | > cP(K,α)
cP(K,α) chosen so that P(Type I error) = α All analyses are carried out at the same adjusted significance level The probability of early rejection is high but the power at the final analysis may be compromised

19 Pocock Boundaries p-values for Zk (two-sided) per interim analysis (K=5)

20 O’Brien-Fleming Boundaries
Reject H0 if | Zk | > cOBF(K,α)√(K / k) for k=K we get | ZK | > cOBF(K,α) cOBF(K,α) chosen so that P(Type I error) = α Early analyses are carried out at extreme adjusted significance levels The probability of early rejection is low but the power at the final analysis is almost unaffected

21 O’Brien-Fleming Boundaries
p-values for Zk (two-sided) per interim analysis (K=5)

22 Wang & Tsiatis Boundaries
Reject H0 if | Zk | > cWT(K,α,θ)(K / k)θ - ½ θ = 0.5 gives Pocock’s test; θ = 0, O’Brien-Fleming implemented in some software (e.g. EaSt) Can accomodate any intermediate choice between Pocock and O’Brien-Fleming

23 Wang & Tsiatis Boundaries
p-values for Zk (two-sided) per interim analysis (K=5) with  = .2

24 Haybittle & Peto Boundaries
Reject H0 if | Zk | > 3 for k = 1,...,K-1 Reject H0 if | Zk | > cHP(K,α) for k = K | Zk | > 3 corresponds to using p < Early analyses are carried out at extreme, yet reasonable adjusted significance levels Intuitive and easily implemented if correction to final significance level is ignored (pragmatic approach)

25 Haybittle & Peto Boundaries
p-values for Zk (two-sided) per interim analysis (K=5)

26 Boundaries compared p-values for Zk (two-sided) per interim analysis (K=5)

27 Boundaries compared Zk per interim analysis (K=5)

28 Potential savings / costs in using group sequential designs
Expected sample sizes for different designs (K=5): - outcomes normally distributed with  = 2 -  =  = 0.1 for A - B = 1 A - B  Fixed sample Pocock O’Brien-Fleming 0.0 170 205 179 0.5 182 168 1.0 117 130 1.5 70 94

29 Error-Spending Approach
Removing the requirement of a fixed number of equally- spaced analyses Lan & DeMets (1983): two-sided tests “spending” Type I error. Maximum information design: Error spending function → Defines boundaries Accept H0 if Imax attained without rejecting the null

30 Error-Spending Approach
f(t)=min(2-2Φ(z1-α/2),α) yields ≈ O’B-F boundaries f(t)=min(α ln (1+(e -1)t,α) yields ≈ Pocock boundaries f(t)=min(αtθ,α): θ=1 or 3 corresponds to Pocock and O’B-F, respectively

31 How Many Interim Analyses?
One or two interim analyses give most benefit in terms of a reduction of the expected sample size Not much gain from going beyond 5 analyses

32 When to Conduct Interim Analyses?
With error-spending, full flexibility as to number and timing of analyses First analysis should not be “too early” (often at  50% of information time) Equally-spaced analyses advisable In principle, strategy/timing should not be chosen based on the observed results

33 Who conducts interim analyses?
Independent Data Monitoring Committee Experts from different disciplines (clinicians, statisticians, ethicists, patient advocates, …) Reviews trial conduct, safety and efficacy data Recommends Stopping the trial Continuing the trial unchanged Amending the trial

34 Sample Size Re-Estimation
Stat 208: Statistical Thinking Sample Size Re-Estimation Assume normally distributed endpoints Sample size depends on σ2 If misspecified, nI can be too small Idea: internal pilot study estimate σ2 based on early observed data compute new sample size, nA if necessary, accrue extra patients above nI Chapter 10

35 Early Stopping for Futility
Stopping to reject H0 of no treatment difference Avoids exposing further patients to the inferior treatment Appropriate if no further checks are needed on, e.g., treatment safety or long-term effects. Stopping to accept H0 of no treatment difference Stopping “for futility” or “abandoning a lost cause” Saves time and effort when a study is unlikely to lead to a positive conclusion.

36 Two-Sided Test

37 Stochastic Curtailment
Idea: Terminate the trial for efficacy if there is high probability of rejecting the null, given the current data and assuming the null is true among future patients Conversely, terminate the trial for futility if there is low probability of rejecting the null, given the current data and assuming the alternative is true among future patients

38 Conditional Power At the interim analysis k, define
pk(Δ) = PHA(Test will reject H0 | current data) A high value of pk(0) suggests T will reject H0 terminate the trial & reject H0 if pk(0) > ξ terminate the trial & accept H0 if 1-pk(Δ) > ξ’ (1-sided) probabilities of error, type I  α / ξ, type II  β / ξ’ Note: ξ and ξ’  0.8

39 Conditional Power Unconditional power for α=0.05 and β=0.1 at Δ=0.2
Conditional power for a mid-trial analysis with an estimate of Δ of 0.1 probability of rejecting the null at the end of the trial has been reduced from 0.9 to 0.1

40 Conditional Power B(t) = Z(t)t1/2 = t

41 Slope = assumed treatment effect in future patients
Conditional Power Slope = assumed treatment effect in future patients

42 Crosshatched area = conditional power

43 Predictive Power π(Δ | data) is the posterior density
Problem with the conditional power approach: it is computed assuming Δ not supported by the current data. A solution: average across the values of Δ “Predictive power” π(Δ | data) is the posterior density Termination against H0 if Pk > ξ etc. What prior ?

44 Futility guidelines Less indicated More indicated
Controversial intervention requiring large randomized evidence (e.g. drug eluding stents) Time to event endpoints with rapid enrollment (e.g. cholesterol lowering drugs) Intervention in current use Learning curve by investigators (e.g. mechanical heart valves) Late effects suspected Safety expected to be an issue (e.g. cox-2 inhibitors) Approved competitive products (e.g. drugs for allergic rhinitis) Long pipeline of alternative drugs (e.g. oncology) Short-term outcomes (e.g. 30 day mortality in sepsis)

45 Overruling futility boundaries
No stopping when boundary crossed Stopping when boundary not crossed Time trends Baseline imbalances Major problems with quality of data Considerable imputation of missing data Important secondary endpoints showing benefit External information on benefit t of similar therapies Benefit/risk ratio unlikely to be good enough to adopt experimental treatment All endpoints showing consistent trends against experimental treatment External information on lack of effect of similar therapies

46 Adaptive Designs Based on combining p-values from different analyses
Allow for flexible designs sample size re-calculation any changes to the design (including endpoint, test, etc!)

47 Adaptive Designs L = k-1/2∑Φ-1(1-pk) Lehmacher and Wassmer (1999):
At stage k, combine one-sided p-values p1,... ,pk L = k-1/2∑Φ-1(1-pk) Use any group sequential design for L Slight power loss as compared to a group-sequential plan Flexibility as to design modifications: OK for control of type I error, BUT…

48 Potential concerns with adaptive designs
Major changes between cohorts make clinical interpretation difficult If eligibility / endpoint changed, what is adequate label? Temporal trends Operational bias Less efficient than group sequential for sample size adjustments Modest gains (in general), high risks


Download ppt "The changing landscape of interim analyses for efficacy / futility"

Similar presentations


Ads by Google