 Determine if a new agent or a new treatment regimen appears sufficiently efficacious to be worth further investigation ◦ Not attempting to prove or.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Goal: Make statement(s) regarding unknown population parameter values based on sample data Elements of a hypothesis test: Null hypothesis.
Advertisements

Phase II/III Design: Case Study
Research Study Designs
Choosing Endpoints and Sample size considerations
Which difference should we target? Alberto Sobrero Ospedale San Martino IRCCS Genova, Italy.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Targeted (Enrichment) Design. Prospective Co-Development of Drugs and Companion Diagnostics 1. Develop a completely specified genomic classifier of the.
Basic Design Consideration. Previous Lecture Definition of a clinical trial The drug development process How different aspects of the effects of a drug.
Clinical Trial Design Considerations for Therapeutic Cancer Vaccines Richard Simon, D.Sc. Chief, Biometric Research Branch, NCI
Robertson JFR et al. J Clin Oncol 2009;27(27):
ODAC May 3, Subgroup Analyses in Clinical Trials Stephen L George, PhD Department of Biostatistics and Bioinformatics Duke University Medical Center.
1 Equivalence and Bioequivalence: Frequentist and Bayesian views on sample size Mike Campbell ScHARR CHEBS FOCUS fortnight 1/04/03.
Sample Size Determination
Meeting Agenda Presentations on endpoints –Regulatory issues –Scientific issues Pros and cons of end points –Classical end points –Non-classical end points.
Phase II Design Strategies Sally Hunsberger Ovarian Cancer Clinical Trials Planning Meeting May 29, 2009.
Re-Examination of the Design of Early Clinical Trials for Molecularly Targeted Drugs Richard Simon, D.Sc. National Cancer Institute linus.nci.nih.gov/brb.
Adaptive Designs for Clinical Trials
RANDOMIZED CLINICAL TRIALS. What is a randomized clinical trial?  Scientific investigations: examine and evaluate the safety and efficacy of new drugs.
Sample Size Determination Ziad Taib March 7, 2014.
Power and Sample Size Part II Elizabeth Garrett-Mayer, PhD Assistant Professor of Oncology & Biostatistics.
Phase II Trials in Oncology S. Gail Eckhardt, MD Lillian Siu, MD Brian I. Rini, M.D.
Prospective Subset Analysis in Therapeutic Vaccine Studies Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
Making all research results publically available: the cry of systematic reviewers.
Improving Phase II Designs: Increasing phase III success Methods in Clinical Cancer Research Feb 6, 2015.
1 Efficacy Results NDA (MTP-PE) Laura Lu Statistical Reviewer Office of Biostatistics FDA/CDER.
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
CI - 1 Cure Rate Models and Adjuvant Trial Design for ECOG Melanoma Studies in the Past, Present, and Future Joseph Ibrahim, PhD Harvard School of Public.
Background to Adaptive Design Nigel Stallard Professor of Medical Statistics Director of Health Sciences Research Institute Warwick Medical School
Sample Size Determination Donna McClish. Issues in sample size determination Sample size formulas depend on –Study design –Outcome measure Dichotomous.
Study design P.Olliaro Nov04. Study designs: observational vs. experimental studies What happened?  Case-control study What’s happening?  Cross-sectional.
Optimal cost-effective Go-No Go decisions Cong Chen*, Ph.D. Robert A. Beckman, M.D. *Director, Merck & Co., Inc. EFSPI, Basel, June 2010.
How much can we adapt? An EORTC perspective Saskia Litière EORTC - Biostatistician.
Introduction to inference Use and abuse of tests; power and decision IPS chapters 6.3 and 6.4 © 2006 W.H. Freeman and Company.
The time to progression ratio for phase II trials of personalized medicine Marc Buyse, ScD IDDI, Louvain-la-Neuve, and I-BioStat, Hasselt University, Belgium.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Sample Size And Power Warren Browner and Stephen Hulley  The ingredients for sample size planning, and how to design them  An example, with strategies.
Randomized Trial of Preoperative Chemoradiation Versus Surgery Alone in Patients with Locoregional Esophageal Carcinoma, Ursa et al. Statistical Methods:
1Bachelot T et al. Proc SABCS 2010;Abstract S1-6.
1 Statistics in Drug Development Mark Rothmann, Ph. D.* Division of Biometrics I Food and Drug Administration * The views expressed here are those of the.
BASED ON PROTOCOL VERSION 1 SEPTEMBER 2012 A new study evaluating an investigational drug to treat patients with HER2-positive metastatic gastroesophageal.
Survival Analysis, Type I and Type II Error, Sample Size and Positive Predictive Value Larry Rubinstein, PhD Biometric Research Branch, NCI International.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
The Use of Predictive Biomarkers in Clinical Trial Design Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Adaptive Designs for Using Predictive Biomarkers in Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
Using Predictive Classifiers in the Design of Phase III Clinical Trials Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute.
August 20, 2003FDA Antiviral Drugs Advisory Committee Meeting 1 Statistical Considerations for Topical Microbicide Phase 2 and 3 Trial Designs: A Regulatory.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Compliance Original Study Design Randomised Surgical care Medical care.
1 BLA Sipuleucel-T (APC-8015) FDA Statistical Review and Findings Bo-Guang Zhen, PhD Statistical Reviewer, OBE, CBER March 29, 2007 Cellular, Tissue.
1 Pulminiq™ Cyclosporine Inhalation Solution Pulmonary Drug Advisory Committee Meeting June 6, 2005 Statistical Evaluation Statistical Evaluation Jyoti.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Sample Size and Power Considerations.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Response, PFS or OS – what is the best endpoint in advanced colorectal cancer? Marc Buyse IDDI, Louvain-la-Neuve & Hasselt University
Results of a Phase 2, Multicenter, Single-Arm Study of Eribulin Mesylate as First-Line Therapy for Locally Recurrent or Metastatic HER2-Negative Breast.
DSCI 346 Yamasaki Lecture 1 Hypothesis Tests for Single Population DSCI 346 Lecture 1 (22 pages)1.
Overview of Standard Phase II Design Issues Elizabeth Hill, PhD Associate Professor of Biostatistics Hollings Cancer Center Medical University of South.
Introduction to inference Use and abuse of tests; power and decision
Overview of Standard Phase II Design Issues
Swain SM et al. Proc SABCS 2012;Abstract P
Pilot Studies: What we need to know
Aiying Chen, Scott Patterson, Fabrice Bailleux and Ehab Bassily
Reviewer: Dr. Sunil Verma Date posted: December 12th, 2011
Baselga J et al. SABCS 2009;Abstract 45.
Tobias Mielke QS Consulting Janssen Pharmaceuticals
Optimal Basket Designs for Efficacy Screening with Cherry-Picking
Statistics for Clinical Trials in Cancer Research
Finding a Balance of Synergy and Flexibility in Master Protocols
Presentation transcript:

 Determine if a new agent or a new treatment regimen appears sufficiently efficacious to be worth further investigation ◦ Not attempting to prove or establish that the new agent improves outcome  Verify the safety of the therapy  Provide statistical rigor/formal evaluation context and targeted patient population

 Often formulate as testing a null hypothesis vs. an alternative ◦ E.g. H 0 : p r = 0.05 vs. H a : p r = 0.20, where p r is the true proportion of patients who will respond to the new agent  Consequence of a type I error (  ): an ineffective agent will be studied further ◦ Use  = 0.10 (one-sided) ◦ Larger than in phase III studies

 Consequence of a type II error (  ): an effective agent will not be studied further ◦ should be < 0.10  In practice, tend to be multiple phase II studies performed in multiple diseases, so the overall chance of missing an effective treatment is lower  Selection of therapies for phase III testing is based on all available data, not on a single phase II study

 Single arm with single analysis (can have multiple single arm studies in one protocol)  Single arm with interim stopping rules (usually with suspension of accrual)  Randomized selection designs (pick-the-winner)  Comparative randomized control  Randomized discontinuation designs

 Patients refractory to standard therapy  If some patients improve, agent must have some activity  Often use H 0 : p r = 0.05 vs. H a : p r = 0.20  Simon’s (1989) optimal two-stage designs minimize expected sample size under H 0

 Simon’s optimal design for p r = 0.05 vs 0.20: ◦ 1 st stage: treat 12 patients; stop if no responses ◦ 2 nd stage: treat 25 patients; conclude inactive if < 4 / 37 (11%) respond  CTEP / IDB has been pushing this design for new agents in diseases without prior evidence of activity

 Single arm two-stage designs are inefficient for multicenter studies ◦ Time and effort needed to develop protocol and CRFs and set up database ◦ Cost of activation at institutions  Prefer settings where single stage designs are appropriate or studies with multiple strata and / or multiple arms

 Might be appropriate ◦ If some prior evidence of activity ◦ For combinations of new drugs with standard treatments  Example: H 0 : p r = 0.20 vs H a : p r = 0.37 (null rate depends on level of activity for standard rx) ◦ 1-stage: 45 patients, reject H 0 if > 12 / 45 (27%) respond ◦ 2-stage: conclude inactive if < 5 / 25 (20% 1 st stage) or 13/50 (26% overall) respond

 Cytostatic agents might improve disease stabilization rates rather than improve response rates  Test for improvement in disease stabilization rates; e.g. H 0 : p s = 0.30 vs. H a : p s = 0.50, where p s = proportion stable or responding (free of progression) at x months (e.g. x = 4)  Calculations the same as for response

 Multinomial: test e.g. H 0 : p r = 0.05 and p s = 0.30 vs. H a : p r > 0.05 or p s > 0.30 ◦ Less efficient than binomial ◦ May be more difficult to interpret  TTP or PFS ◦ Kaplan-Meier estimate at single time or other nonparametric test ◦ Parametric (e.g. exponential) models can be slightly more efficient  Survival generally not appropriate

 Test e.g. H 0 : p r = 0.05 and p s = 0.30 vs. H a : p r > 0.05 or p s > 0.30  Need to consider power against multiple alternative values; e.g. H a1 : p r = 0.20, p s = 0.30 H a2 : p r = 0.14, p s = 0.40 H a3 : p r = 0.05, p s = 0.50  1-stage: n=46, reject H 0 if > 6 responses or >20 cases responding or stable ◦ = 0.09; power = 0.92 for H a1, H a2, & H a3

 Separate evaluation of each arm ◦ Each arm evaluated in a similar population  Selection designs: select the ‘best’ arm for further study  Comparative randomized control  Randomized discontinuation Randomized designs are larger and more complex – need to explain each arm to patients

 Concern about selection bias in studies without a simultaneous control group ◦ Studies can enroll different patient groups even with the same nominal population ◦ Population drift and stage migration  Control groups more appropriate for evaluating contribution to a combination or effect on progression than for determining if any response activity  Comparing studies from different groups

 Often not needed because ◦ Phase II studies can only detect fairly large effects, so biases would need to be large ◦ Consequence of a false positive is further testing of an inactive drug ◦ Cooperative group or other studies conducted in the same network with central data review produce fairly consistent results  Increase the time and expense for phase II evaluation

 (Simon, Wittes and Ellenberg, 1985) randomize between 2 or more experimental arms (no control arm) ◦ In a sense, least efficacious arm is a control for the others  Select the best arm for further evaluation  Usually define ‘best’ to be the arm with the best outcome, no matter how small the difference

 With two arms,   0.50 ◦ Rationale: doesn’t matter which arm is selected if they are nearly equivalent  Often separate efficacy test for each arm, too ◦ 1-stage or 2-stage  Usually prefer randomizing over a series of separate studies ◦ Facilitates (informal) comparisons ◦ Guards against sampling bias

RANDOMIZERANDOMIZE RX1 RX2 RXk... Estimated Resp Rate R1/n1 R2/n2 Rk/nk... RXj is ‘best’ if Rj/nj > Ri/ni for i  j Can use other endpoints

 Example: Simon’s optimal 2-stage design for H 0 : p r = 0.20 vs H a : p r = 0.40 enrolls 17 patients in the 1 st stage and 20 in the 2 nd (  =  =.10)  Apply this design to each arm in a 2-arm randomized selection design Prob arm is winner p r1 p r2 RX1RX2Neither

 Probability of selecting the best arm declines as the number of arms increases P{X 1 >max(X 2,…,X k )} =  x P(X 1 =x)P(X 2 <x, X 3 <x…,X k <x|X 1 =x) =  x P(X 1 =x)P(X 2 <x) P(X 3 <x)… P(X k <x) =  x P(X 1 =x)P(X 2 <x) k-1 if X 2, …, X k have the same distribution

 X 1 ~Bin(50,.32); X 2,…,X k ~Bin(50,.20) gives P{X 1 >max(X 2,…,X k )} =.90 for k = 2 and P{X 1 >max(X 2,…,X k )} =.72 for k = 6  Advanced renal trial of several targeted agents: 6 arms, n=55 / arm ◦ TTP compared via Cox model ◦ If one arm has median TTP of 7.2 months and the other 5 have median TTP of 4.8 months (50% improvement), then the probability of selecting the best arm is 0.87

 Discussed for evaluating cytostatic agents in Korn et al. (2001)  Randomize experimental vs. standard and formally compare the arms  Appropriate if don’t have a reasonable prior estimate of expected control arm outcomes  Endpoint could be any of the standard phase II endpoints (e.g. TTP, response)  Might target larger differences than a phase III

 Test could be a definitive (phase III) evaluation with  < (one-sided) ◦ If little prior phase II efficacy data, need early stopping rules for lack of benefit ◦ Might not be appropriate if a second phase III study evaluating survival would be needed

 Test could be a suggestive (phase II) evaluation with a larger  (e.g to 0.20) ◦ Appropriate for screening new agents ◦ If positive, still needs to be followed by a definitive phase III study ◦ Korn et al. suggest using  = 0.20, because the sample size with  = 0.10 is large enough that it might be better to go directly to the definitive study

 3-arm comparison of TTP (two dose levels of bevacizumab), targeting a large difference (100% improvement in median TTP), but designed to be definitive (Yang, 2003) ◦ Overall  =.05 (two-sided),  = 0.20 ◦ Each comparison at one-sided ◦ Needed about 50 patients / arm (stopped early because of highly significant results) ◦ Crossover from placebo to low dose drug

 Was overall  =.05 appropriate? ◦ A second, larger study is still needed for survival ◦ Could have identified drug as promising with even fewer patients (larger  )

 Was a placebo needed? ◦ Evaluation bias should be much smaller than a doubling of TTP ◦ May not be to identify promising drugs ◦ FDA tends to require a placebo for TTP  Was the control arm needed? ◦ Would results from a single arm, single institution study have been convincing?

 Cisplatin + C225 vs. Cisplatin + Placebo  Designed to have 90% power to detect an improvement in median PFS from 2 months to 4 months (100% improvement) with  = (one-sided)  With allowance for non-compliance, required 54 eligible patients / arm  Final accrual was 117 eligible patients

Cisplatin + C225 Cisplatin + Placebo P-value (one-sided) Response Rate26%10%0.02 Median PFS4.2 mos2.7 mos0.09 Median Survival9.2 mos8.0 mos0.21 Hazard Ratios (Placebo/C225) and 95% CIs PFS: 1.31 (0.91, 1.89) Survival: 1.16 (0.80, 1.69)

 Study is not definitive – underpowered for both PFS and survival  Is it promising – should a follow-up study of C225 be done?  Would a better strategy have been a single arm phase II with a response endpoint, followed by a definitive phase III based on the ‘promising’ response rate of 26%?

 PFS reaches the one-sided  = 0.10 cutoff for a ‘promising’ phase II result  Survival would not have been an appropriate endpoint ◦ Estimated improvement is 16% ◦ Confidence interval consistent with 20% decrease to a 69% increase ◦ Phase II sample sizes are not adequate to detect realistic survival effects

 An enrichment strategy based on randomizing patients who appear to be doing well on the treatment (Rosner, Stadler, Ratain, 2002)  Initially all patients are treated, patients free of progression for some period of time are randomized between continuing treatment and placebo, with crossover from placebo to treatment at progression or specified PFI  Complex design with a blinded randomization and 3 registration points

RANDOMIZERANDOMIZE RX Placebo REGISTERREGISTER REASSESREASSES Initial RX Off study PD SD Continue RX Response Crossover at PD or after specified PFI (run-in)

 Usefulness depends on how successful the run-in is in selecting patients benefiting from treatment ◦ TTP is highly variable in most diseases, so randomized population will be a mixture ◦ Korn et al. (2001), Capra (2004) suggest often less efficient than standard RCT  Carry-over effect could dilute difference between randomized arms  Requires much larger sample size

 CALGB (CAI in RCC)  Randomize patients if stable after 16 weeks  Enrolled 374 patients; randomized 65 eligible patients (17%) ◦ Enrichment strategy was not successful, but does CAI have any activity? ◦ Did they learn any more from 374 patients than ECOG did from 57 patients in a more traditional two-stage phase II design (E4896)?

 In many settings, conventional phase II designs may still be appropriate  Start-up costs for single-arm two-stage designs are a concern  Randomized phase II studies allow evaluation of multiple agents or schedules and protect against sampling bias  Selection designs are useful for informal comparison and identifying promising agents

 Control arms should not ordinarily be needed, but can be effective in some settings  Survival is seldom (never?) the best phase II endpoint  Randomized discontinuation designs may not be appropriate and need to be strongly justified

Capra WB (2004). Comparing the power of the discontinuation design to that of the classic randomized design on time-to-event endpoints. Controlled Clinical Trials 25: Freidlin B, Dancey J, Korn EL, Zee B, Eisenhauer E (2002) Multinomial phase II trial designs (letter to the editor). Journal of Clinical Oncology 20:599. Korn EL, Arbuck SG, Pluda JM, Simon R, Kaplan RS, Christian MC (2001). Clinical trial designs for cytostatic agents: are new approaches needed? Journal of Clinical Oncology 19: Rosner GL, Stadler W, Ratain MJ (2002). Randomized discontinuation design: application to cytostatic antineoplastic agents. Journal of Clinical Oncology 20: Simon R (1989). Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 10:1-10. Simon R, Wittes RE, Ellenberg SS (1985). Randomized phase II clinical trials. Cancer Treatment Reports 12: Yang JC et al. (2003). A randomized trial of bevacizumab, an anti-vascular endothelial growth factor antibody, for metastatic renal cancer. New England Journal of Medicine 349: