Presentation on theme: "Interim Analysis in Clinical Trials: A Bayesian Approach in the Regulatory Setting Telba Z. Irony, Ph.D. and Gene Pennello, Ph.D. Division of Biostatistics."— Presentation transcript:
Interim Analysis in Clinical Trials: A Bayesian Approach in the Regulatory Setting Telba Z. Irony, Ph.D. and Gene Pennello, Ph.D. Division of Biostatistics Office of Surveillance and Biometrics Center for Devices and Radiological Health, FDA No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred.
2 The Frequentist Approach to Interim Analyses Trial: 200 patients Several interim analyses planned If statistical significance is found at any of the looks, the trial stops and is successful. In order to obtain a significance level of 0.05, the levels at each possible stopping point must be smaller than 0.05. Of course, there is an infinite number of possibilities of distributing the level 0.05 among the possible stopping points.
4 The alpha spending function is a version of stopping boundary that is a continuous function of the percentage of the study completed. There are lots of different boundaries, techniques, and software (PEST, EAST) to control type I error while performing interim looks in a clinical trial. You could create (and publish) your own boundary and develop your own software. Other Ways to Penalize Multiple Looks
5 One looked once at the data during the trial with the intention of stopping but didnt => does not reach significance (required p-value = 0.041) => not successful! The competitor did not look => reached significance (required p-value = 0.05) => successful! Moreover: reaching significance or not depends on whose boundary you choose => you have to tell in advance which one you want and cannot change your mind! That approach violates the Likelihood Principle! Two companies come with the same data on 200 patients Both obtain the same p-value at the end (0.045)
6 Frequentists inferences are based on p-values probabilities are on the sample space Estimation: P(data| parameter Hypothesis testing P(data | H ) Bayesians inferences are based on posterior distributions probabilities are on the parameter space Estimation: P( parameter |data) Hypothesis testing: P( H | data) Likelihood Principle prevails Why do frequentist and Bayesian approaches differ?
7 The Bayesian Approach to Interim Analyses No adjustments are made for interim looks or modifications of trials in midcourse. In fact, the decision of continuing the study or not should be based on potential costs and benefits weighed by the current posterior distribution of the unknowns.
8 p: chance of patient success Interim Look: 190 successes out of 200 observed patients Remaining: 80 patients. How many successes among the next 80 patients? Could we stop the trial and make a decision already? Predictive Distribution P( future observation(s) | prior, data) Example 1: Curtailment of the trial via predictive distribution Clinical trial
9 Predictive probability of success for the next 80 patients (based on the posterior distribution for p) Make sure that the remaining patients are exchangeable with the observed patients.
10 We collect data to learn about an endpoint Stop when the credible interval is small enough Stop when there is reasonable assurance that the hypothesis is true (or false) or the device is safe and effective (or is not). 2. Interim Analyses: Multiple Looks When we know enough we should stop the trial
11 Example: A totally Bayesian approach Planned ahead => no penalty for multiple looks! Interest: - rate of adverse effect - endocarditis Prior: P( ) - hierarchical model - used old results Interest: Posterior: P( | data) Want to be small. How small? New treatment
12 If there is a good chance that success. If there is a good chance that > target => failure. Pre-defined criterion: Look at every 100 patient years. Stop and approve if P( 0.99. Stop and dont approve if P( > target | data) > 0.80. Minimum sample size: 300 patient years (hierarchical model) Maximum sample size: 800 patient years ( practical reasons) The company could in fact go on for ever (!!) Target: = 0.1
13 Start with 300 patient years (data1). If P( 99% stop and approve. If P( > target | data1) > 80% stop and cut losses. If neither of the above continue sampling.
14 Sample 100 patient years more (data2). If P( 99% stop and approve. If P( > target | data1+ data2) > 80% stop and cut losses. If neither of the above continue sampling.
15..... Sample 100 more (data i). If P( 99% stop and approve. If P >target |data1+data2... +data i)>80% stop and cut losses. Approved!
16 Frequentists believe one may sample to a foregone conclusion: one may stop as soon as one gets significance; or by repeatedly testing it is possible to reject Ho with probability as close to 1 as desired (probabilities of hypothesis are usually martigales - D. Berry, 1987). It takes an infinite amount of time, though. Controlling the overall type I error is a critical concern in monitoring clinical trials - Regulators. Some Bayesians (perhaps inspired by OBrien and Fleming) believe that one needs to be more restrictive in early stages of the trial, requiring higher posterior probabilities for termination at the beginning…. Problems
17 More Problems Normal distribution paradox (D. Rubin): Two Companies: Frequentist and Bayesian Both Perform Interim Looks. Bayesian uses non-informative prior and stops when P(Ho|data) >95%. Frequentist use a nominal significance level of 5%. In the Normal case with non-informative prior, the posterior probability is numerically equal to 1-(p-value). The Frequentist pays a penalty for the looks and the Bayesian doesnt. The Frequentist may be unsuccessful and the Bayesian may be successful with the same data!
18 To illustrate what would happen in terms of type I and II errors in a Bayesian Trial, we request simulations at the design stage. If the rate were actually below the target, what would happen? How often would would the trial stop for futility? (type II error) If the rate were actually above the target, what would happen? How often would the device be approved? (type I error) A Regulatory Solution Whenever the type I error rate is too high, we modify the design!
19 For each rate, simulated 1000 trials Evaluating the experimental design – Heart Valve