Presentation on theme: "Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Sep. 16, 2005 Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD 20850 * No official support."— Presentation transcript:
Non-randomized Medical Device Clinical Studies: A Regulatory Perspective Sep. 16, 2005 Lilly Yue, Ph.D.* CDRH, FDA, Rockville MD 20850 * No official support or endorsement by the Food and Drug Administration of this presentation is intended or should be inferred.
2 Acknowledgements Thanks to my colleagues in the Cardiovascular and Ophthalmic Devices Branch, CDRH, FDA, for their help with this presentation.
3 Outline 1.Types of non-randomized studies in medical device 2.Why non-randomized? 3.Major concerns with non-randomized studies 4.Conclusions
4 Type of Non-randomized Study Concurrent 2-arm non-randomized study One-arm study: With comparison to a historical control, where patient-level data of historical control is available and used in treatment comparison; Pseudo 2-arm comparative study With comparison to a fixed target value obtained from multiple historical trials; OPC: Objective performance criterion
5 Why Non-Randomized? RCT is sometimes not ethical or practical. Sample size determination based on one- sample hypothesis test smaller sample size? May save time and money Least Burdensome? To keep the rapid pace of new technology development.
6 Example 1 --- Coronary artery bare mental stent In 1994, two superiority RCTs: compared novel Palmaz- Schatz (P-S) stent with standard balloon angioplasty Subsequently, randomized non-inferiority trials: compared several new stents with the P-S stent Over 30 different coronary stents developed for use in USA or Europe over the past 10 years Design changes: minor modification resulting in small or local effect on patient outcome. Approx. 1.5 millions patients per year world-wide undergoing the catheter-based coronary treatment Stent life cycle < 2 yrs, required for randomized trial Some non-randomized studies were conducted to keep the rapid pace of new stent development.
7 Major Concerns with Non–randomized Studies In 2-arm comparative study: – Selection bias and comparability of treatmeent groups In a pseudo 2-arm comparative study –Is historical active control good? –Bio-creep –Are test and historical control comparable in pat. population? In one-arm study with OPC –Share all the problems associated with historical controls –Problems with the validity of its determination –Problems with the appropriateness of its use
8 Comparability of Treatment Groups In RCT, expect that all patient covariates, measured or unmeasured, are balanced between the two treatment groups. So, the two treatment groups are comparable and observed treatment difference is an unbiased estimate of true treatment difference. None of advantages provided by randomized trials is available in non-randomized studies. A potential problem: Two treatment groups were not comparable before the start of treatment, i.e., not comparable due to imbalanced covariates between two treatment groups. So, direct treatment comparisons are invalid.
9 Traditional Adjustments for Covariates Three common methods of adjusting for confounding covariates: –Matching –Subclassification (stratification) –Regression (Covariate) adjustment
10 Propensity Score Methodology Replace the collection of confounding covariates with one scalar function of these covariates: the propensity score. Age Gender Duration ……. 1 composite covariate: Propensity Score Balancing score
11 Propensity Score Methodology (cont.) Propensity score (PS): conditional prob. of receiving treatment A rather than treatment B, given a collection of observed covariates. Purpose: simultaneously balance many covariates in the two treatment groups and thus reduce the bias. PS construction: multiple logistic regression model based on patient data of all measured covariates and actual treatment received.
12 Properties of propensity scores –A group of patients with the same propensity score are equally likely to have been assigned to trt A. –Within a group of patients with the same propensity score, e.g., 0.7, some patients actually got trt A and some got trt B, just as they had been randomly allocated to whichever trt they actually received. Randomized After the Fact PS=0.7 Trt ATrt B
13 –When the propensity scores are balanced across two treatment groups, the distribution of all the covariates are balanced in expectation across the two groups. –Use the propensity scores as a diagnostic tool to measure treatment group comparability. –If the two treatment groups overlap well enough in terms of the propensity scores, we compare the two treatment groups adjusting for the PS. Compare treatments adjusting for propensity score –Matching –Subclassification (stratification) –Regression (Covariate) adjustment
14 Stratification –All patients are sorted by propensity scores. –Divide into equal-sized subclasses. –Compare two trts within each subclass, as in a randomized trial; then estimate overall trt effect as weighted average. –It is intended to use all patients. –But, if trial size is small, some subclass may contain patients from only one treatment group. PS 12 ……. 5
15 Example 2 New vs. control in a non-randomized study Primary endpoint: MACE incidence rate at 6- month after treatment Non-inferiority margin: 7%, in this study Sample size: new: 290, control: 560 14 covariates were considered.
16 Table. Distribution of patients at five strata Subclass Control New Total 1 142 28 170 2 127 43 170 3 122 48 170 4 119 51 170 5 50 120 170 Total 560 290 850
18 Baseline covariate balance checking before and after PS stratification adjustment Mean p-value New Control Before After -------------------------------------------------------------------------------------- Mi 0.25 0.40 <.0001 0.4645 Diab 0.28 0.21 0.0421 0.8608 CCS 2.41 2.75 0.0003 0.3096 Lesleng 11.02 12.16 <.0001 0.5008 Preref 3.00 3.08 0.0202 0.2556 Presten 62.75 66.81 <.0001 0.4053
19 Diagnostic check for covariate balance: Percentage of patients with prior Mi
20 Example 3 Non-concurrent, two-arm, multi-center study Control: Medical treatment without device, N=65, hospital record collection Treatment: Device A, N = 130 Primary effectiveness endpoint: Treatment success Hypothesis testing: superiority in success rate 20 imbalanced clinically important baseline covariates, e.g., prior cardiac surgery 22% patients with missing baseline covariate values
22 Two treatment groups are not comparable –Imbalance in multiple baseline covariates –Imbalance in the time of enrollment So, any direct treatment comparisons on the effectiveness endpoint are inappropriate. And, p-values from direct treatment comparisons are un- interpretable. What about treatment comparisons adjusting for the imbalanced covariates? –Traditional covariate analysis –Propensity score analysis
23 Performed propensity score (PS) analysis Handed missing values –MI: generate multiple data sets for PS analysis –Generate one data set: generalized PS analysis –Others Included all statistically significant and/or clinically important baseline covariates in PS modeling. Checked comparability of two treatment groups through estimated propensity score distributions. Found that the two treatment groups did not overlap well.
28 Conclusion: –The two treatment groups did not overlap enough to allow a sensible treatment comparison. –So, any treatment comparisons adjusting for imbalanced covariates are problematic. Question: Given that the two treatment groups are not comparable, what can we do NOW?
29 Risks and Dangers of Non-randomized Studies A study with an historical control may result in much riskier and potentially more burdensome than a RCT. It may be impossible to predict in advance whether the patient population with the new treatment is comparable to the population for the historical control. The sponsor must have legal access to the historical data at the patient level and all the right baseline covariates need to have been measured in both groups.
30 One-arm study with OPC OPC: Objective Performance Criterion Introduced to the FDA approx. 10 yrs ago, in the evaluation of prosthetic heart valves Compared a new heart valve against a fixed number, e.g., a complication rate, obtained from multiple approved heart valve trials by outside experts Data and guidance for the OPC in public domain Now, used for some coronary artery stents, e.g., Ho: 6 mo. MACE rate point estimate + delta Ha: 6 mo. MACE rate < point estimate + delta
31 Delta: often a clinical call, no FDA guidance Point estimate: often estimated mean of outcome but currently no universally accepted way OPC: point estimate, by some people, point estimate + delta, by some others One-sample OPC equivalence, stated by some Question: equivalent to what? To a fixed number!? One-sample OPC equivalence -- inappropriate claim!
32 Problems with OPC: Limited good historical data available for the development of OPC Disregarded variability associated with the estimate in historical studies Bio-creep problem Time sensitive Patient population sensitive Who is responsible for developing the OPC for a particular device? Who is responsible for checking if the OPC developed is appropriate? Who is responsible for updating an existing OPC?
33 Example 4 Primary effectiveness endpoint: acute procedure success Evaluated for the entire SVT patient population SVT = (AVNRT AVRT AF) OPC = 85% Hypotheses: Study results: N = 200, # of successes =164 Observed success rate = 82% (< 85%) C.I.: (76%, 87%) OPC was not met!
35 One of major problems with the post-hoc analysis and claim: –The OPC, 85%, was developed for the entire SVT population, not for a particular patient subpopulation –In fact, an OPC for AVNRT, if exists, should be much higher than 85%
36 Example 5. Weighted OPC Primary endpoint: one-year adverse event rate Patient population: Co-morbid group & Anatomic group Expected event rate: Co-morbid: 14%, Anatomic: 11% A common delta, 3% Individual OPC: Co-morbid: 17%, Anatomic: 14% Weighted OPC: n 1 and n 2 were # of patients actually enrolled Hypotheses:
37 Problems: N = n 1 + n 2 was fixed in protocol, but n 1 and n 2 and hence w 1 and w 2 were not. So, the weighted OPC is a random variable, e.g., –If w 1 = w 2 =50%, then OPC W = 15.5% –IF w 1 = =70%, w 2 = 30%, then OPC W = 16% The setting of hypotheses is inappropriate. The weighted OPC leads to the study subject to questionable manipulation.
39 What if enrolled: Co-morbid: 12%, Anatomic: 88% Then, post-hoc determined weighted OPC = 12% *17% + 88% *14% =14.4% Overall observed event rate: 12% * 8% + 88% * 16% = 15% ( > 14.4%) Cant reject Ho! The weighted OPC leads to the study subject to questionable manipulation.
40 Test statistic: If treat n 1 and n 2 as fixed, then the calculated C.I. would be narrower than it should be. What if pre-specify w 1 and w 2 in protocol? Should comply with protocol If the w 1 and w 2 are not achieved in the actual enrollment, then a protocol deviation has been committed.
41 Conclusions Select comparable control prospectively! Bio-creep problem should be avoided. OPC should be determined by sufficient solid scientific evidence. Variability associated with the point estimate from historical studies should be incorporated in the determination of OPC. OPC should be appropriately adjusted for different patient populations, and different indications for use. OPC would need to be updated constantly. RCT is still the gold standard for clinical studies. RCT should be preserved for new technology!
42 References Rubin, DB, Estimating casual effects from large data sets using propensity scores. Ann Intern Med 1997; 127:757-763 Rosenbaum, PR, Rubin DB, Reducing bias in observational studies using subclassification on the propensity score. JASA 1984; 79:516-524 Dagostino, RB, Jr., Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group, Statistics in medicine, 1998,17:2265-2281
43 References Blackstone, EH, Comparing apples and oranges, J. Thoracic and Cardiovascular Surgery, January 2002; 1:8-15 Grunkemeier, GL and et al, Propensity score analysis of stroke after off-pump coronary artery bypass grafting, Ann Thorac Surg 2002; 74:301-305 Wolfgang, C. and et al, Comparing mortality of elder patients on hemodialysis versus peritoneal dialysis: A propensity score approach, J. Am Soc Nephrol 2002; 13:2353-2362