Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical issues: Designing your trial for success ELIZABETH GARRETT-MAYER, PHD PROFESSOR OF BIOSTATISTICS HOLLINGS CANCER CENTER, MEDICAL UNIVERSITY.

Similar presentations


Presentation on theme: "Statistical issues: Designing your trial for success ELIZABETH GARRETT-MAYER, PHD PROFESSOR OF BIOSTATISTICS HOLLINGS CANCER CENTER, MEDICAL UNIVERSITY."— Presentation transcript:

1 Statistical issues: Designing your trial for success ELIZABETH GARRETT-MAYER, PHD PROFESSOR OF BIOSTATISTICS HOLLINGS CANCER CENTER, MEDICAL UNIVERSITY OF SOUTH CAROLINA

2 Golden rule of clinical trials “Perform a study that will answer an important clinical question with reasonable certainty and with respect for patients.” A “negative result” is still a success (statistically). The goal is clear and interpretable results that either support or reject your scientific hypothesis.

3 Five keys to statistical success in cancer clinical trials 1.Clearly written objectives 2.Well-defined endpoints 3.A rigorous study design that addresses the objectives 4.An appropriate statistical analysis plan 5.A well-justified sample size Cursory attention to any of the above could lead to a trial with flawed or uninterpretable results.

4 Highly dependent Objectives Endpoints Sample size Study Design Analysis Plan

5 Five keys to statistical success in cancer clinical trials 1.Clearly written objectives 2.Well-defined endpoints 3.A rigorous study design that addresses the objectives 4.An appropriate statistical analysis plan 5.A well-justified sample size

6 Objectives Different phases of research have different types of primary objectives All objectives (but especially primary objective) needs to be clearly stated, including the intended patient population for the study.

7 Phase I objectives Typically the primary objective is to identify an optimal dose, and summarize the toxicities observed. “To determine the maximum tolerated dose of MEDI-573* in patients with advanced solid tumors.” “To determine the optimal biologic dose of MEDI-573 in patients with advanced solid tumors.” “To determine the recommended phase II dose of MEDI-573 in patients with advanced solid tumors.” * Haluska et al. Clinical Cancer Research, 2014; 20:4747-57.

8 Phase II objectives More varied types of objectives. Historically, to evaluate preliminary efficacy to help (a) determine if there is enough activity to warrant a phase III study and (b) obtain clinical efficacy estimates to help plan the phase III trial. Further explore safety and toxicity of the drug. Sometimes phase II studies are randomized; sometimes not. “To determine the progression-free survival in patients with metastatic castration-resistant prostate cancer treated with abituzumab.”

9 Phase III objectives Comparative objective: head to head comparison of two regimens. Often standard of care versus a new regimen. Often evaluating if adding something to a standard regimen improves outcomes. “To compare overall survival in patients treated with apatinib or placebo in patients with refractory advanced metastatic gastric cancer*” Li et al, JCO, May 1 2016.

10 Endpoints and Patient Populations Each phase has a common set of endpoints What is an endpoint? an clinical endpoint in cancer research generally refers to a measure of disease status, symptom, or laboratory value that constitutes one of the target outcomes of the trial. The endpoints are what you measure on each person. Examples: Disease response at 8 weeks Time from enrollment to death Occurrence of grade 3 or 4 gastrointestinal toxicity. Must be clearly defined and “measurable” using an objective approach.

11 Endpoints and Patient Populations The patient population will also affect the endpoints of interest: Phase I: Historically, patients who have exhausted other forms of therapy; usually metastatic or advanced disease. Phase II: Usually, have already been treated at least one line of treatment. Phase III: can be newly diagnosed or not; post-surgery (i.e. disease-free) or not. Why is this important? ◦“disease-free survival” is only meaningful in patients without evidence of disease ◦“progression-free survival” is only meaningful in patients who have cancer at the onset of the study.

12 Choosing the patient population: Homogeneity vs. generalizability Heterogeneous patient population: ◦More variability ◦Larger sample size to see a clinical effect ◦Easier to accrue to ◦Can generalize to more patients Homegeneous patient population ◦Less variability ◦Smaller sample size to see a clinical effect ◦Harder to accrue to ◦Cannot generalize to many groups of patients. Example: Metastatic breast cancer Triple negative (ER-/PR-/HER2-) Previously treated or newly diagnosed Example: Metastatic non-small cell lung cancer EGFR mutation Previously treated with TKI inhibitor

13 Common endpoints Phase I: dose-limiting toxicities (DLTs). ◦These are pre-defined toxicities that are considered related to drug and acceptable in only a relatively small fraction of patients (e.g. 20%). ◦For each patient, we evaluate whether not they had or did not have a DLT within a pre-specified time frame. ◦The DLT rate is used to determine the maximum tolerated dose. Phase II: clinical efficacy outcomes ◦Response: Clinical response measured by shrinkage of tumor burden by some metric (e.g. RECIST) ◦Time to Progression, or Progression-Free Survival: Measured by a clinically significant increase in tumor burden by some metric. ◦Time to Relapse: recurrence of disease Phase III: “gold standard” efficacy outcome ◦Time to death (aka Overall Survival) ◦Progression free survival ◦Quality of life measures

14 Three main categories Binary: yes vs. no ◦Patient’s tumor responded vs. patient’s tumor did not respond. ◦Patient had a DLT or did not have a DLT in cycle 1. Time to event: The amount of time from study start until the event occurs ◦The number of months from randomization until death. ◦Tricky because some patients never have the event Continuous: a numeric score ◦Quality of life, measured a numeric score, often at multiple time points ◦PSA (prostate specific antigen), used to measure prostate cancer recurrence

15 Choosing the endpoint Why not always use overall survival? It takes too long to be practical in phase II (and sometimes in phase III) Most other clinical outcomes, such as response and progression-free survival, are considered surrogate outcomes. For example, we have good reason to believe that if we can get a tumor to shrink, or delay time to progression, that we will prolong the life of the patient. (As it turns out, there is substantial literature suggesting that neither is a good surrogate for overall survival in a number of settings)

16 Designing your trial Many many options, too many to discuss. Key principle: the design should allow you to make valid, unbiased inferences. Threats? ◦Biases ◦Wrong endpoint ◦Poor measurement (i.e. measurement error) ◦Low accrual ◦Inconsistency between objectives and design: ◦Example: Goal is to find the MTD, but the sample size is 500? ◦Example: Goal is to compare two agents, but the sample size is only 30?

17 Sample size and Power Sample size: the number of patients you plan to enroll Power: The probability you will declare that your drug is effective if it really is effective (in the context of the primary objective).

18 Truth vs. inference   Type I error, or alpha level Conclude drug doesn’t work Conclude drug does work Drug does not work Drug works TRUTH INFERENCE Power

19 What’s a p-value? Hypothesis testing framework: ◦We have two hypotheses to consider: the drug works vs. the drug doesn’t work ◦Which hypothesis is correct? Traditional hypothesis testing: ◦Assume your new treatment does not work. ◦Given the data that we observe in the trial, how likely is it that the drug really doesn’t work? Example: ◦Treatment A has a response rate of 40% ◦Is Treatment A + B better than Treatment A alone? ◦If Treatment A + B has a 60% or greater response rate, we would consider that a clinically meaningful improvement in response.

20 What’s a p-value? Statistically: ◦Null Hypothesis: H 0 : p = 0.40 ◦Alternative hypothesis H 1 : p = 0.60 The p-value: What is the probability of observing a result (i.e. data) as or more extreme than we’ve seen in this study if the null hypothesis is true? If the p-value is small, it means that the null hypothesis is unlikely to be true. If the p-value is large, it means that the data is consistent with the null hypothesis. Significance: We say a result is “statistically significant” if the p-value is small (e.g. <0.05). We want to pick a sample size that makes us likely to pick the correct hypothesis.

21 Goldilocks analogy Sample size too small: “Underpowered study” ◦Your study does not enroll enough patients to clearly determine if the drug works or it doesn’t work, in the context of your primary objective. ◦You may see a clinically meaningful difference, but a statistically insignificant result. Sample size too big: “Overpowered study” ◦Your study enrolls more patients than you need to make inferences about the effectiveness of the drug. ◦You may conclude that the drug works due to statistical significance, but the clinical effect size is too small to be meaningful Overpowered and Underpowered Studies: ◦Both waste time and resources! Sample size just right: ◦At the end of the study, your inferences will make sense! ◦A significant p-value will imply that your drug had a clinically meaningful impact on the outcome of interest.

22 Goldilocks: Too small Let’s use our example with a sample size of 16. What if we see 9 responses in 16 patients? Observed response rate (9/16) = 0.56. What does that look like statistically? P-value = 0.09 Too much overlap in the distributions. Why? Sample size is too small. UNDERPOWERED

23 Goldilocks: Too big What if N=100? What if we see 56 responses in 100 patients? Observed response rate = 0.56. (same response rate as previous example). P-value <0.0001 No overlap in the distributions. Why? Sample size is too large. OVERPOWERED

24 Goldilocks: Just right Based on a sample size calculator, N = 53 should be “just right.” 30 responses in 53 patients yields a response rate of 0.56. P-value = 0.01 Some overlap in distributions. Sample size is appropriate with 90% power

25 Avoiding the p-value trap Be sure to focus on the “effect size” ◦In all examples, the response rate is 0.56. ◦This quite high given our expectations, regardless of sample size P-values have become exceedingly overemphasized. Look for confidence intervals to help interpret the precision of inferences Most common are 95% confidence intervals: “We are 95% confident that the true value of the response rate lies within this interval.” Sample sizeEffect size95% Confidence Interval Width of 95% Confidence Interval 160.56(0.30, 0.80)0.50 530.56(0.42, 0.70)0.28 1000.56(0.46, 0.66)0.20

26 Clinically meaningful?

27 What is a meaningful improvement?

28 Triple negative breast cancer trials * And don’t forget quality of life!

29 Advocacy take-home points Clinical trials require rigorously stated objectives and clearly defined endpoints that are appropriate for the phase of study and the patient population. Sample size is an important consideration for both scientific and ethical reasons. P-values are only part of the story: always consider clinical effect size, sample size and precision when planning trials and interpreting results. Only well-designed studies are ethical: studies that are poorly designed may lead to uninterpretable, biased or useless results. Transparency is key in both clinical trial design and presentation of results.


Download ppt "Statistical issues: Designing your trial for success ELIZABETH GARRETT-MAYER, PHD PROFESSOR OF BIOSTATISTICS HOLLINGS CANCER CENTER, MEDICAL UNIVERSITY."

Similar presentations


Ads by Google