Presentation on theme: "The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Lilly Yue, Ph.D.* CDRH, FDA,"— Presentation transcript:
1The Application of Propensity Score Analysis to Non-randomized Medical Device Clinical Studies: A Regulatory PerspectiveLilly Yue, Ph.D.*CDRH, FDA, Rockville MD 20850*No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred.
2Outline Randomized clinical trials Non-randomized studies and a potential problemPropensity scores methods for bias reductionPractical issues with the application of propensity score methodologyLimitations of propensity score methodsConclusions
3Randomized TrialsAll patients have a specified chance of receiving each treatment.Treatments are concurrent.Data collection is concurrent, uniform, and high quality.Expect that all patient covariates, measured or unmeasured, e.g., age, gender, duration of disease, …, are balanced between the two treatment groups.
4Randomized TrialsAssumptions underlying statistical comparison tests are met.So, the two trt groups are comparable and observed treatment difference is an unbiased estimate of true treatment difference.But, the above advantages are not guaranteed for small, poorly designed or poorly conducted randomized trials.
5Nonrandomized Studies and a Potential Problem None of advantages provided by randomized trials is available in non-randomized studies.A potential problem:Two treatment groups were not comparable before the start of treatment.i.e., not comparable due to imbalanced covariates between two treatment groups.So, direct treatment comparisons are invalid.
6Adjustments for Covariates Three common methods of adjusting for confounding covariates:MatchingSubclassification (stratification)Regression (Covariate) adjustment
7Each covariate: 2 categories 5 covariates: 32 subclasses Question: When there are many confounding covariates needed to adjust for, e.g., age, gender, …Matching based on many covariates is not practical.Subclassification is difficulty: As the number of covariates increases, the number of subclasses grows exponentially:Each covariate: 2 categories 5 covariates: 32 subclassesRegression adjustment may not be possible: Potential problem: over-fitting
8Propensity Score Methodology Replace the collection of confounding covariates with one scalar function of these covariates: the propensity score.AgeGenderDuration…….1 composite covariate:Propensity ScoreBalancing score
9Propensity Score Methodology (cont.) Propensity score (PS): conditional prob. of receiving Trt A rather than Trt B, given a collection of observed covariates.Purpose: simultaneously balance many covariates in the two trt groups and thus reduce the bias.
10Propensity scores construction Statistical modeling of relationship between treatment membership and covariatesStatistical methods: multiple logistic regression or othersOutcome: event -- actual trt membership: A or BPredictor variables: all measured covariates, some interaction terms or squared terms, e.g.,age, gender, duration of disease,…, age*duration,…
11Propensity scores construction Clinical outcome variable, e.g., major complication event, is NOT involved in the modelingNo concern of over-fittingObtain a propensity score model: a math equationPS = f (age, gender, …)Calculate estimated propensity scores for all patients
12Properties of propensity scores A group of patients with the same propensity score are equally likely to have been assigned to trt A.Within a group of patients with the same propensity score, e.g., 0.7, some patients actually got trt A and some got trt B, just as they had been randomly allocated to whichever trt they actually received.
14When the propensity scores are balanced across two treatment groups, the distribution of all the covariates are balanced in expectation across the two groups.Use the propensity scores as a diagnostic tool to measure treatment group comparability.If the two treatment groups overlap well enough in terms of the propensity scores, we compare the two treatment groups adjusting for the PS.
15Compare treatments adjusting for propensity score MatchingSubclassification (stratification)Regression (Covariate) adjustment
16PS Trt A vs. Trt B Compare treatments based on matched pairs Matching based on propensity scores (PS)PS Trt A vs. Trt BCompare treatments based on matched pairsProblem: may exclude unmatched patientsPS1PS2PSm
17Stratification PS All patients are sorted by propensity scores. Divide into equal-sized subclasses.Compare two trts within each subclass, as in a randomized trial; then estimate overall trt effect as weighted average.It is intended to use all patients.But, if trial size is small, some subclass may contain patients from only one treatment group.PS12…….5
18Regression (covariate) adjustment Treatment effect estimation model fitting:the relationship of clinical outcome and treatmentOutcome: Clinical outcome, e.g., adverse eventsPredictor variables: trt received, propensity score, asubset of important covariatesStatistical method: e.g., regression or logisticalregression
19Propensity Score Methods SummaryFit propensity score (PS) modelusing all measured covariatesEstimate PS for all patientsusing PS modelCompare treatmentsadjusting for propensity scores
20Practical Issues Issues in propensity score estimation How to handle missing baseline covariate valuesWhat terms of covariates should be includedEvaluation of treatment group comparabilityAssessment of the resulting balance of the distributions of covariatesIssues in treatment comparison:Which method: matching, stratification, regressionIssues in study design with PS analysisPre-specified vs. post hoc PS analysisPre-specify the covariates needed to collect in the study and then included in PS estimationSample size estimation adjusting for the propensity scores
21Example – Device A Non-concurrent, two-arm, multi-center study Control: Medical treatment without device,N=65, hospital record collectionTreatment: Device A, N = 130Primary effectiveness endpoint: Treatment successHypothesis testing: superiority in success rate20 imbalanced clinically important baseline covariates, e.g., prior cardiac surgery22% patients with missing baseline covariate values
23Two treatment groups are not comparable Imbalance in multiple baseline covariatesImbalance in the time of enrollmentSo, any direct treatment comparisons on the effectiveness endpoint are inappropriate.And, p-values from direct treatment comparisons are un-interpretable.What about treatment comparisons adjusting for the imbalanced covariates?Traditional covariate analysisPropensity score analysis
24Performed propensity score (PS) analysis Handed missing values MI: generate multiple data sets for PS analysisGenerate one data set: generalized PS analysisOthersIncluded all statistically significant and/or clinically important baseline covariates in PS modeling.Checked comparability of two trt groups through estimated propensity score distributions.Found that the two trt groups did not overlap well.
28Treatment SuccessTotalCrl SNTrt SNTried Cochran-Mantel-Haenszel test controlling for PS quintile, Logistic regression using PS as a continuous covariateHowever, the sig. p-values are un-interpretable
29Conclusion:The two treatment groups did not overlap enough to allow a sensible treatment comparison.So, any treatment comparisons adjusting for imbalanced covariates are problematic.
30Example: Device B New vs. control in a non-randomized study Primary endpoint: MACE incidence rate at 6-month after treatmentNon-inferiority margin: 7%, in this studySample size: new: 290, control: 56014 covariates were considered.
31propensity score stratification adjustment Covariate balance checking before and afterpropensity score stratification adjustmentMean p-valueNew Control Before AfterMi <DiabCCSLesleng <PrerefPresten <
32Model BuildingThe PS is conditional Prob. that a patient would have been assigned to new device, based on his or her baseline covariates.A hierarchical logistic regression model with a stepwise selection process was used to build the propensity score model.The final propensity score model includes all covariates as well as a quadratic term.
33Table 2. Distribution of patients at five strata Subclass Control New TotalTotal
35propensity score stratification adjustment Covariate balance checking before and afterpropensity score stratification adjustmentMean p-valueNew Control Before AfterMi <DiabCCSLesleng <PrerefPresten <
36After adj. balance check: Prior Mi rate:Overall: Group % patients with prior MiNewControlDiffAfter:QuintileGroupNewControl
39Study Design Plan in advance Pre-specify clinically relevant baseline covariates: as many as possibleSample size estimation:Ignore the propensity score adjustment?Could be inappropriate
40LimitationsPropensity score methods can only adjust for observed confounding covariates and not for unobserved ones.Propensity score is seriously degraded when important variables influencing selection have not been collected.Propensity score may not eliminate all selection bias.
41Limitations Propensity score methods work better in larger samples. Propensity score is not only way of adjusting for covariates. And, it may or may not be helpful in a particular comparison study.Randomized trials are considered the highest level of evidence for trt comparison. Propensity score methods lack the discipline and rigor of randomized trials, and not as definitive as randomized trials.
42ConclusionsPropensity score methods generalize technique with one confounding covariate to allow simultaneous adjustment for many covariates and thus reduce bias.Propensity score methodology is an addition to, not a substitute of traditional covariate adjustment methods.Plan ahead and carefully consider the practical issues discussed above.Randomized studies are still preferred and strongly encouraged whenever possible!
43ReferencesRubin, DB, Estimating casual effects from large data sets using propensity scores. Ann Intern Med 1997; 127:Rosenbaum, PR, Rubin DB, Reducing bias in observational studies using subclassification on the propensity score. JASA 1984; 79:D’agostino, RB, Jr., Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group, Statistics in medicine, 1998,17:
44ReferencesBlackstone, EH, Comparing apples and oranges, J. Thoracic and Cardiovascular Surgery, January 2002; 1:8-15Grunkemeier, GL and et al, Propensity score analysis of stroke after off-pump coronary artery bypass grafting, Ann Thorac Surg 2002; 74:Wolfgang, C. and et al, Comparing mortality of elder patients on hemodialysis versus peritoneal dialysis: A propensity score approach, J. Am Soc Nephrol 2002; 13: