Mentor: Dr. Kathryn Chaloner Iowa Summer Institute in Biostatistics

Slides:



Advertisements
Similar presentations
Interim Analysis in Clinical Trials: A Bayesian Approach in the Regulatory Setting Telba Z. Irony, Ph.D. and Gene Pennello, Ph.D. Division of Biostatistics.
Advertisements

Comparing Two Proportions
Sampling techniques & sample size
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Equivalence Testing Dig it!.
Designing an impact evaluation: Randomization, statistical power, and some more fun…
Phase II/III Design: Case Study
Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Sample size estimation
Statistical Analysis for Two-stage Seamless Design with Different Study Endpoints Shein-Chung Chow, Duke U, Durham, NC, USA Qingshu Lu, U of Science and.
Chapter 10 Section 2 Hypothesis Tests for a Population Mean
Estimation of Sample Size
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
SAMPLE SIZE ESTIMATION
Normal Distribution The Normal Distribution is a density curve based on the following formula. It’s completely defined by two parameters: mean; and standard.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Chapter 11: Sequential Clinical Trials Descriptive Exploratory Experimental Describe Find Cause Populations Relationships and Effect Sequential Clinical.
Basic Elements of Testing Hypothesis Dr. M. H. Rahbar Professor of Biostatistics Department of Epidemiology Director, Data Coordinating Center College.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Estimating a Population Proportion
Inferences About Process Quality
Sample Size Determination
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Phase II Design Strategies Sally Hunsberger Ovarian Cancer Clinical Trials Planning Meeting May 29, 2009.
RANDOMIZED CLINICAL TRIALS. What is a randomized clinical trial?  Scientific investigations: examine and evaluate the safety and efficacy of new drugs.
Sample Size Determination Ziad Taib March 7, 2014.
Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery.
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
1 STATISTICAL HYPOTHESES AND THEIR VERIFICATION Kazimieras Pukėnas.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
CI - 1 Cure Rate Models and Adjuvant Trial Design for ECOG Melanoma Studies in the Past, Present, and Future Joseph Ibrahim, PhD Harvard School of Public.
Background to Adaptive Design Nigel Stallard Professor of Medical Statistics Director of Health Sciences Research Institute Warwick Medical School
Sample Size Determination Donna McClish. Issues in sample size determination Sample size formulas depend on –Study design –Outcome measure Dichotomous.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Optimal cost-effective Go-No Go decisions Cong Chen*, Ph.D. Robert A. Beckman, M.D. *Director, Merck & Co., Inc. EFSPI, Basel, June 2010.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
How much can we adapt? An EORTC perspective Saskia Litière EORTC - Biostatistician.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Lecture 5: Chapter 5: Part I: pg Statistical Analysis of Data …yes the “S” word.
1 An Interim Monitoring Approach for a Small Sample Size Incidence Density Problem By: Shane Rosanbalm Co-author: Dennis Wallace.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Hypothesis and Test Procedures A statistical test of hypothesis consist of : 1. The Null hypothesis, 2. The Alternative hypothesis, 3. The test statistic.
1 Statistics in Drug Development Mark Rothmann, Ph. D.* Division of Biometrics I Food and Drug Administration * The views expressed here are those of the.
Sample Size Considerations for Answering Quantitative Research Questions Lunch & Learn May 15, 2013 M Boyle.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
1 Interim Analysis in Clinical Trials Professor Bikas K Sinha [ ISI, KolkatA ] RU Workshop : April18,
Estimation of a Population Mean
Issues concerning the interpretation of statistical significance tests.
Week 8 – Power! November 16, Goals t-test review Power! Questions?
1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
© Copyright McGraw-Hill 2004
IPDET Module 9: Choosing the Sampling Strategy. IPDET © Introduction Introduction to Sampling Types of Samples: Random and Nonrandom Determining.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
WHAT IS A SAMPLING DISTRIBUTION? Textbook Section 7.1.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Critical Appraisal Course for Emergency Medicine Trainees Module 2 Statistics.
Outline Sampling Measurement Descriptive Statistics:
Experimental Research
Chapter 16: Sample Size “See what kind of love the Father has given to us, that we should be called children of God; and so we are. The reason why the.
Understanding Results
Sample Size Estimation
Strategies for Implementing Flexible Clinical Trials Jerald S. Schindler, Dr.P.H. Cytel Pharmaceutical Research Services 2006 FDA/Industry Statistics Workshop.
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Type I and Type II Errors
Presentation transcript:

Mentor: Dr. Kathryn Chaloner Iowa Summer Institute in Biostatistics Evaluating an Adaptive Clinical Trial with Quantitative Endpoints, Sample Size Re-estimation, Sequential Monitoring for Efficacy, and Monitoring for Futility By: Harrison Reeder Mentor: Dr. Kathryn Chaloner Iowa Summer Institute in Biostatistics

Outline What exactly does that title mean? Basic Clinical Trial design Interim Monitoring for Efficacy 3 schemes for interim monitoring for efficacy Interim monitoring for futility Adaptive sample size re-estimation Simulation Study of Design Performance Conclusion KP The title describes a novel clinical trial design with several key features, so I will describe those, and present the results of the study we performed to test their utility. We will end by concluding whether this design is an improvement over a simpler traditional design.

Clinical Trial Design: The Basic Case The most basic element of clinical trial design is determining an adequate sample size Calculating sample size requires specifying: approximate variance of outcomes the desired Type I error rate minimum clinically meaningful treatment effect desired power to detect that effect Code in R: power.t.test() Kamrine Elisa Poels Type I error: rejecting null when null is true (finding an effect when there is no effect) Power: probability of rejecting null given some alternative is true Taken from Introduction to Randomized Controlled Clinical Trials by John Matthews

Motivating Clinical Trial Example Treatment of insomnia in children with autism Phase II Clinical Trial Only previous study was small and on adult subjects In general, poor understanding of responses motivates need for adaptive trial Better performance with incorrect estimates?

Interim Monitoring for Efficacy Why use interim monitoring? Complications of interim monitoring Interim monitoring inflates Type I error Solution: Change boundary of significance KP Taken from Introduction to Randomized Controlled Clinical Trials by John Matthews

Schemes for Interim Efficacy Monitoring Pocock "constant" boundaries sets constant p-value boundary to use at every monitoring point Earlier rejection is easier, but final test is stringent O'Brien-Fleming boundaries makes rejection harder at earlier points and easier as trial progresses Fleming-Harrington-O'Brien boundaries middle-ground between above strategies KP Boundary First Interim Second Interim Third Interim Final point Pocock 0.0182 O-F 0.00005 0.0039 0.0184 0.0412 F-H-O 0.0067 0.0083 0.0103 0.0403

Interim Monitoring for Futility Why monitor for futility? Conditional power Estimates probability of having significant results given observed data and (design) assumptions If probability is lower than a specified threshold, then trial is stopped KP

Adaptive Sample Size Recalculation Early estimate of response variance is difficult To account for difference between estimate and true value, this design uses observed estimated variance halfway into trial to re-estimate sample size Investigators can set a maximum sample size for each group Harrison Thomas Reeder In earlier phase trials or trials of novel interventions, it may be difficult for investigators to make precise estimates of treatment effect or variance, which are key components to estimating sample size. For this reason, some "adaptive" designs reestimate the sample size needed for the trial partway through using observed variance from early responses. If, for example we had a trial that estimated 35 patients per group, we may reestimate after the first 18 patients in each group and get a new estimated sample size.

Research Question: How does our design perform? Using simulation, we compare the design to designs without the features described We also compare the merits of the three interim monitoring schemes Values of interest: Bias of final treatment effect estimate True confidence of nominal 95% Confidence Interval True Type I error True power Distribution of stopping points HR So how does our design with all of these features compare to simpler designs? To find out we evaluate the performance of designs with various combinations of these features using a simulation study. In these simulations we also tested all three interim monitoring schemes (Pocock OF and FHO), and compared performance of the designs when investigator assumptions were accurate or inaccurate. We evaluated the designs based on these criteria listed.

Designing the Simulation First Interim Sample Size is 9 Check for efficacy Second Interim Sample Size is 18 Check for futility Final sample size is recalculated Third Interim Sample size is 𝐹𝑖𝑛𝑎𝑙+18 2 if recalculated Without sample size recalculation, size is 28 Final Point Sample size is Final ≤ 50 if recalculated Without sample size recalculation, size is 35 Motivating Study: Effect of Sleeping Drug in Adolescents and Young Adults with Autism Spectrum Disorder Design assumptions: Mean treatment effect: 32 Response standard deviation: 36 HR For our simulation, then, we used a monitoring design similar to the one proposed for the motivating study. We had four interim monitoring periods, at sample size of 9, 18, 28 and 35 per group, or if reestimation then the last two were set in that process. Simulation seed: 42 Conditional power seed: 123

Comparison of Boundary Types Pocock Highest Type I error Highest bias Lowest power Smallest sample size (i.e., best chance of finding efficacy early) O'Brien-Fleming and Fleming-Harrington-O'Brien Similar results across measures and assumptions O'Brien-Fleming boundary is more commonly used HR But first, to simplify, we will draw a simple conclusion concerning the three boundary types we simulated. We found that the Pocock constant value boundary had many features that did not suit our application, including highest type I error, highest estimate bias, and lowest power. It does offer the best chance of stopping for efficacy early, but the power diminishes as the trial continues. The other two efficacy boundaries, by contrast, performed similarly, and because the O-F boundary is somewhat more common, we opt to use that boundary for our final comparison.

Effect of Interim Monitoring for Efficacy (Without Sample Size Re-estimation or Futility Monitoring) Ending sample size < 35 per group because we can stop at earlier interim points when results are significant KP

Effects of Interim Monitoring for Futility (Without Sample Size Re-estimation) Large drop in true Type I error from ~0.05 to ~0.01 (more opportunities to stop an ineffective trial from following through to the end and having significance by chance) Smaller stopping point sample size when response variance is larger than expected Chance of stopping early for futility, even if alternative is true, explains a slight drop in true power KP

Effects of Interim Monitoring for Futility (Without Sample Size Re-estimation)

KP

KP

Effects of Sample Size Recalculation If initial estimate of treatment response variance was too high, recalculation tends to decrease the ending sample size Likewise, underestimated variance leads to a larger required sample size Recalculation also maintains power of trial better even when initial estimate of variance is too low KP

Effects of Sample Size Re-estimation (Without Futility Monitoring)

KP

Overall Evaluation of Our Design These characteristics show the design's potential value in Phase II trials: Minimizes Type I error rate Maintains power when variance estimate is too low May decrease sample size required to reach a conclusion Limitations: Sample size re-estimation potentially increases cost HR The effects of the design features that Kamrine described hint at our design's potential value in Phase II trials like the study motivating our project. futility monitoring tends to minimize type I error rate, while sample size reestimation helps to maintain the trials' power even if the initial estimate of variance is incorrect. All of these features also offer the possibility of reaching a conclusion sooner, which is especially important. So far we have seen several limitations, however, to our design: re-estimation of sample size may increase sample size needed to reach a conclusion, which potentially increases costs, and the nature of all of these design features is to induce a positive bias in the estimated treatment effect. To get a better sense of the value of our trial design, however, we will compare a few example scenarios.

How Does Our Design Compare to Interim Monitoring for Efficacy Alone? If assumptions are accurate, with our design: Median ending sample size is smaller Power is slightly lower, but comparable Type I error rate is lower (important for Phase 2 trials) If assumptions are inaccurate (overestimated effect size and underestimated variance): Ending sample size tends to be larger (more expensive) Power is higher (though overall both are much lower) Type I error rate is lower HR So, with these ideas in mind we will compare our design--with efficacy and futility monitoring, as well as sample size reestimation--to a simple design with just interim monitoring for efficacy, and evaluate their performance when the initial design assumptions are correct, and when they are incorrect. I will highlight a few points to look for, and then you'll see them graphically. When the assumptions are met, our design performs at least as well or better than the simpler method: ending sample size is often lower, power is comparable, and the Type I error rate is much lower. When the assumptions are incorrect, however, our design also fares better in many respects. It maintains a somewhat higher power, and a lower type I error rate, though it tends to require a higher a larger sample size to reach a conclusion.

HR Here we have the results of the two designs, when the design assumptions of treatment effect and response variance are accurate. Here we see that our design tends to stop earlier, with about half stopping by the second interim monitoring point, with lower type I error rate and only a slightly lower power.

O'Brien-Fleming Graphs HR Now, we compare the designs in the case that the investigator makes incorrect estimates both of the effect and the response variance. This combination would be most detrimental to the outcome of the trial, because it means the trial is trying to detect an effect that is smaller than expected, but also less predictable than expected. In this case, our design adapts by tending to reestimate higher sample sizes, as shown on the left, which keeps the true power somewhat higher than the simpler design. Furthermore, the monitoring for futility keeps the type I error very low, which is valuable in this type of trial. Of course the limitation is that over half the trials have stopping points higher than the standard of 35, and at least a quarter reaching the absolute max of 50, but .

Conclusion: "Is our design better for the motivating study?" Yes! Minimizing Type I errors is important in Phase II trials, which is achieved in our design Treatment effect and response variance are not easily estimated in the motivating study Our design's ability to maintain power and keep error rates low even with inaccurate design assumptions is beneficial Limitation: Potential for higher re-estimated sample size may increase cost of trial HR

Acknowledgements The Iowa Summer Institute for Biostatistics program Dr. Kathryn Chaloner and the entire University of Iowa Biostatistics department My research partner Kamrine Poels at the University of Arizona The Carleton Math Department