Presentation on theme: "The Role of the Statistician in Clinical Research Teams"— Presentation transcript:
1The Role of the Statistician in Clinical Research Teams Elizabeth Garrett-Mayer, Ph.D.Division of BiostatisticsDepartment of Oncology
2Statistics Statistics is the art/science of summarizing data Better yet…summarizing data so that non-statisticians can understand itClinical investigations usually involve collecting a lot of data.But, at the end of your trial, what you really want is a “punch-line:”Did the new treatment work?Are the two groups being compared the same or different?Is the new method more precise than the old method?Statistical inference is the answer!
3Do you need a statistician as part of your clinical research team? YES!Simplest reasons: s/he will help to optimizeDesignAnalysisInterpretation of resultsConclusions
4What if I already know how to calculate sample size and perform a t-test? Statisticians might know a better approachTrained more formally in design optionsMore “bang for your buck”Tend to be less biasedAdds credibility to your applicationUse resources that are available to you
5Different Roles Very collaborative Consultants Active co-investigator Helps develop aims and designBrought in early in planningContinues to input throughout trial planning and while study continuesConsultantsInactive co-investigatorOften not brought in until eitherYou need a sample size calculation several days before submissionTrial has been criticized/rejected for lack of statistical inputYou’ve finally collected all of the data and don’t know what to do next.Only involved sparsely for planning or for analysis.
6Find a statistician early Your trial can only benefit from inclusion of a statisticianStatisticians cannot rescue a poorly designed trial after the trial has begun.“Statistical adjustment” in analysis plan does not always work.Ignorance is not bliss:Some clinical investigators are trained in statisticsBut usually not all aspects!Despite inclination to choose a particular design or analysis method, there might be better ways.
7Example: Breast Cancer Prevention in Mice NOT clinical study (thank goodness!)Mouse study, comparing 4 types of treatmentsHer-2/neu Transgenic MousePrevention: which treatment is associated with the longest time to tumor formation?A intra-ductalB intra-ductalNo treatmentA intravenous
8Example: Breast Cancer Prevention in Mice Each oval represents a mouse and dots represent ductsLab researchers developed designVarying numbers of mice for each of four treatment groupsMultiple dose levels, not explicitly shown hereA intra-ductalB intra-ductalNo treatmentA intravenous
9How to analyze? Researchers had created their own survival curves. Wanted “p-values”Some design issues:What is the unit of analysis?How do we handle the imbalance?Treatment B has no “control” side
10Our approach Statistical adjustments used Treat “side of mouse” as unit of analysisCannot use mouse: multiple treatments per mouse in many cases. Too many categories (once dose was accounted for)Cannot use duct: hard to model the “dependence” within side and within mice.Still need to include dependence within mice.Need to adjust for treatment on “other side”Problems due to imbalance of dosesCould only adjust for “active” versus “inactive” treatment on other side.
11Our approach We “rescued” this study! Not optimal, but “best” given the designAdjustments might not be appropriateWe “modeled” the data—we made assumptionsWe could not implement all the adjustments we would have liked to.Better approach?Start with a good design!We would have suggested balanceWould NOT have given multiple treatments within mice.
12Statisticians: Specific Responsibilities DesignChoose most efficient designConsider all aims of the studyParticular designs that might be usefulCross-overPre-postFactorialSample size considerationsInterim monitoring plan
13Statisticians: Specific Responsibilities Assistance in endpoint selectionSubjective vs. objectiveMeasurement issuesIs there measurement error that should be considered?What if you are measuring pain? QOL?Multiple endpoints (e.g. safety AND efficacy)Patient benefit versus biologic/PK endpointPrimary versus secondaryContinuous versus categorical outcomes
14Statisticians: Specific Responsibilities Analysis PlanStatistical method for EACH aimAccount for type I and type II errorsStratifications or adjustments are included if necessarySimpler is often betterLoss to follow-up: missing data?
15Sample Size and Power The most common reason we get contacted Sample size is contingent on design, analysis plan, and outcomeWith the wrong sample size, you will eitherNot be able to make conclusions because the study is “underpowered”Waste time and money because your study is larger than it needed to be to answer the question of interestAnd, with wrong sample size, you might have problems interpreting your result:Did I not find a significant result because the treatment does not work, or because my sample size is too small?Did the treatment REALLY work, or is the effect I saw too small to warrant further consideration of this treatment?This is an issue of CLINICAL versus STATISTICAL signficance
16Sample Size and PowerSample size ALWAYS requires the investigator to make some assumptionsHow much better do you expect the experimental therapy group to perform than the standard therapy groups?How much variability do we expect in measurements?What would be a clinically relevant improvement?The statistician CANNOT tell you what these numbers should be (unless you provide data)It is the responsibility of the clinical investigator to define these parameters
17Sample Size and Power Review of power Power = The probability of concluding that the new treatment is effective if it truly is effectiveType I error = The probability of concluding that the new treatment is effective if it truly is NOT effective(Type I error = alpha level of the test)(Type II error = 1 – power)When your study is too small, it is hard to conclude that your treatment is effective
18Example: sample size in multiple myeloma study Phase II study in multiple myeloma patients (Borello)Primary Aim: Assess clinical efficacy of activated marrow infiltrating lymphocytes (aMILs) + GM-CSF-based tumor vaccines in the autologous transplant setting for patients with multiple myeloma.If the clinical response rate is ≤ 0.25, then the investigators are not interested in pursuing aMILs + vaccines further.If the clinical response rate is ≥ 0.40, they would be interested in pursuing further.
19Example: sample size in multiple myeloma study H0: p = 0.25H1: p = 0.40We want to know what sample size we need to have large power and small type I error.If the treatment DOES work, then we want to have a high probability of concluding that H1 is “true.”If the treatment DOES NOT work, then we want a low probability of concluding that H1 is “true.”
20Sample size = 10; Power = 0.17 Distribution of H0: p = 0.25 Probability distributionProportion of respondersVertical line defines“rejection region”
21Sample size = 30; Power = 0.48 Distribution of H0: p = 0.25 Probability distributionProportion of respondersVertical line defines“rejection region”
22Sample size = 75; Power = 0.79 Distribution of H0: p = 0.25 Probability distributionProportion of respondersVertical line defines“rejection region”
23Sample size = 120; Power = 0.92 Distribution of H0: p = 0.25 Probability distributionProportion of respondersVertical line defines“rejection region”
24Not always so easyMore complex designs require more complex calculationsUsually also require more assumptionsExamples:Longitudinal studiesCross-over studiesCorrelation of outcomesOften, “simulations” are required to get a sample size estimate.
25Most common problems seen in study proposals when a statistician is not involved Outcomes are not clearly definedThere is not an analysis plan for secondary aims of the studySample size calculation is too simplistic or absentAssumptions of statistical methods are not appropriate
26Examples: trials with additional statistical needs Major clinical trials (e.g. Phase III studies)Continual reassessment method (CRM) studies)Longitudinal studiesStudy of natural history of disease/disorderStudies with ‘non-random’ missing data
27Major trialsMajor trials usually are monitored periodically for safety and ethical concerns.Monitoring board: Data (Safety and) Monitoring Committee (DMC or DSMC)In these, trials, ideally you would have three statisticians (Pocock, 2004, Statistics in Medicine)Study statisticianDMC statisticianIndependent statisticianWhy?Interim analyses require “unbiased” analysis and interpretation of study data.Industry- versus investigator-initiated trials …differences?
28Study Statistician Overall statistical responsibility Actively engaged in design, conduct, final analysisNot involved in interim analysesWant them to remain ‘blinded’ until the study is complete
29DMC Statistician Experienced trialist Evaluate interim results Decide (along with rest of DMC) whether trial continuesNo conflict of interest
30Independent Statistician Performs interim analysisWrites report of interim analysisNo conflict of interestOnly person to have full access to “unblinded” data until trial completion
31High-maintenance trials Some trials require statistical decision-making during the trialSimon “two-stage” design:Stage 1: Treat about half the patients.Stage 2: If efficacy at stage 1 meets some standard, then enroll the remainder of patients“Adaptive” and “Sequential” trials: final sample size is determined somewhere in the middle of the trialContinual Reassessment Method…
32Continual Reassessment Method Phase I trial design“Standard” Phase I trials (in oncology) use what is often called the ‘3+3’ designMaximum tolerated dose (MTD) is considered highest dose at which 1 or 0 out of six patients experiences DLT.Doses need to be pre-specifiedConfidence in MTD is usually poor.Treat 3 patients at dose KIf 0 patients experience dose-limiting toxicity (DLT), escalate to dose K+1If 2 or more patients experience DLT, de-escalate to level K-1If 1 patient experiences DLT, treat 3 more patients at dose level KIf 1 of 6 experiences DLT, escalate to dose level K+1If 2 or more of 6 experiences DLT, de-escalate to level K-1
33Continual Reassessment Method Allows statistical modeling of optimal dose: dose-response relationship is assumed to behave in a certain wayCan be based on “safety” or “efficacy” outcome (or both).Design searches for best dose given a desired toxicity or efficacy level and does so in an efficient way.This design REALLY requires a statistician throughout the trial.Example: Phase I/II trial of Samarium 153 in High Risk Osteogenic Sarcoma (Schwartz)
34CRM history in briefOriginally devised by O’Quigley, Pepe and Fisher (1990) where dose for next patient was determined based on responses of patients previously treated in the trialDue to safety concerns, several authors developed variantsModified CRM (Goodman et al. 1995)Extended CRM [2 stage] (Moller, 1995)Restricted CRM (Moller, 1995)and others….
36Modified CRM (Goodman, Zahurak, and Piantadosi, Statistics in Medicine, 1995) Carry-overs from standard CRMMathematical dose-toxicity model must be assumedTo do this, need to think about the dose-response curve and get preliminary model.We CHOOSE the level of toxicity that we desire for the MTD (p = 0.30)At end of trial, we can estimate dose response curve.‘prior distribution’ (mathematical subtlety)
37Modifications by Goodman et al. Modified CRM by Goodman, Zahurak, and Piantadosi (Statistics in Medicine, 1995)Modifications by Goodman et al.Use ‘standard’ dose escalation model until first toxicity is observed:Choose cohort sizes of 1, 2, or 3Use standard ‘3+3’ design (or, in this case, ‘2+2’)Upon first toxicity, fit the dose-response model using observed dataEstimate Find dose that is closest to toxicity of 0.3.Does not allow escalation to increase by more than one dose level.De-escalation can occur by more than one dose level.Dose levels are discrete: need to round to closest level
38Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with 0 toxicities2 patient treated at dose 2 with 1 toxicity Fit CRM using equation belowEstimated = 0.77Estimated dose fornext patient is 2.0Use dose level 2for next cohort.
39Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with no toxicities4 patients treated at dose 2 with 1 toxicity Fit CRM using equation on earlier slideEstimated = 0.88Estimated dose for nextpatient is 2.54Round up to doselevel 3 for next cohort.
40Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with no toxicities4 patients treated at dose 2 with 1 toxicity2 patients treated at dose 3 with 1 toxicity Fit CRM using equation on earlier slideEstimated = 0.85Estimated dose for nextpatient is 2.47Use dose level 2for next cohort.
41Example Samarium study with cohorts of size 2: 2 patients treated at dose 1 with no toxicities6 patient treated at dose 2 with 1 toxicity2 patients treated at dose 3 with 1 toxicity Fit CRM using equation on earlier slideEstimated = 0.91Estimated dose for nextpatient is 2.8Use dose level 3for next cohort.Etc.
42Longitudinal Studies Multiple observations per individual over time Sample size calculations are HARDBut, analysis is also complexStandard assumption of “basic” statistical methods and models is that observations are independentWith longitudinal data, we have “correlated” measures within individuals
43Study of Autism in Young Children (Landa) Autism usually diagnosed at age 3.But, there is evidence that there are earlier symptoms that are indicative of autismChildren at high risk of autism (kids with older autism siblings), and “controls” were observed at 6 months, 14 months, and 24 months for symptoms (Mullen)Prospective studyChildren were diagnosed at 36 months into three groups.ASD (autism-spectrum disorder)LD (learning disabled)UnaffectedEarlier symptoms were compared to see if certain symptoms could predict diagnosis.
44P-values were included in original table! ASD (n=23)LD (n=11)Unaffected (n=53)Mullen SubscalesMean(SD)6 MonthsGross Motor51.50(7.69)53.86(6.31)49.35(10.35)Visual Reception52.58(8.94)47.43(10.75)55.18(10.66)Fine Motor46.92(11.86)36.86(7.45)49.54(10.76)Receptive Language50.75(8.21)44.86(8.95)50.18(7.26)Expressive Language47.25(10.14)44.57(4.57)43.98(6.70)Early Learning Composite100.67(12.75)87.29(9.16)99.59(10.10)14 Months46.91(12.34)52.91(10.38)58.16(10.52)48.39(10.95)51.00(9.32)54.73(9.03)50.48(10.44)52.82(9.40)57.41(7.28)34.70(13.38)39.64(5.95)52.59(12.26)39.04(15.14)47.00(7.64)52.02(11.33)87.39(19.97)95.55(10.68)108.27(13.89)24 Months35.43(8.69)49.18(11.04)52.20(10.97)43.26(10.98)48.91(11.59)56.73(10.48)36.04(14.17)(7.97)52.78(11.07)35.74(15.25)42.73(11.31)59.22(10.74)36.65(15.31)45.27(12.03)60.14(12.15)78.43(21.68)93.73(14.86)114.98(15.89)P-values were included in original table!
45Statistical modeling can help! Previous table was hard to make conclusions fromEach time point was analyzed separately, and within time points, groups were compared.Use some reasonable assumptions to help interpretationKids have “growth trajectories” that are continuous and smoothObservations from within the same child are correlated.
47Conclusions are much easier The models that were used to make the graphs as somewhat complicatedBut, they are “behind the scenes”The important information is presented clearly and succintlyANOVA approach does not “summarize” data
48Concluding RemarksGet your statistician involved as soon as you begin to plan your studyThings statisticians do not like:Being contacted several days before grant/protocol/proposal is dueRewriting inappropriate statistical sectionsAnalyzing data that has arisen from a poorly designed trialStatisticians have a lot to add“fresh” perspective on your studyStudy will be more efficient!