Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery.

Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery Georgetown University Medical Center

Power and Non-Inferiority in Clinical Trials Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery Georgetown University Medical Center

If you can not reject the null hypothesis of ‘no effect’, this does not ‘prove’ there is no effect Why?

Frequency Distribution for One Variable SubjectScore 19 210 3 411 5 6 712 8 9 1012 1113 1213 1413 1513 1614 1714 1814 1914 2015 2115 2215 2316 2416 2516 2617 2717 2818 2918 3019 3120 3221 Score Subject Count % of total 910.50 1021.00 1131.50 1242.00 1352.50 1442.00 1531.50 1631.50 1721.00 1821.00 1910.50 2010.50 2110.50 total32 Frequency Table mean sd

Compare the outcomes of treatment vs. control groups Effect Size = (mean TX – mean CON ) / SD CON

Mean difference=1 If SD=3, ES = 1/3= 0.33

Mean difference=2 If SD=3, ES = 2/3 = 0.67

Mean difference=4 SD = 3 ES = 1.33

Mean difference=4 SD = 1.94 ES = 2.1

Mean difference=4 SD = 1.1 ES = 3.6

Type-I and Type-II Errors Different 1-  (Power)  The Same  1-  In fact, TX & Placebo are: Different The Same In your experiment, you observe that TX & Placebo are:  the rate of false positives, Type I error rate β the rate of false negatives, Type II error rate Power = 1 – β, the rate of true positives

Plot of Score Distribution under the Null and Alternative Hypotheses Using 2-tailed independent-groups t-test with alpha=.05, and power =.80 H0H0 H1H1 N needed per group is 64 t p

Plot of Score Distribution under the Null and Alternative Hypotheses Using 2-tailed independent-groups t-test with alpha=.05, and power =.95 N needed per group is 105 t p

Power is reduced by: Measurement Error - This will tend to ‘muddy’ the outcome scores, making tx effect harder to distinguish – i.e., it increases the SD of both the CON & TX groups, reducing the ES. Intent-to-treat analysis - If subjects drop out because they see no progress. - S’s rarely drop out because they get cured early, but if they did, then completer-analysis would reduce power. Low disease severity - Less room for improvement

If you can not reject the null hypothesis of ‘no effect’, this does not ‘prove’ there is no effect Why? Because your power to detect an effect might have been low.

Faul, F., Erdfelder, E., Buchner, A., & Lang, A.-G. (2009). Statistical power analyses using G*Power 3.1: Tests for correlation and regression analyses. Behavior Research Methods, 41, 1149-1160. Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191. http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/download-and-register

Equivalence & Non-inferiority trials How do you show that a new treatment is not inferior to a standard treatment?

Quality of the evidence base There should be several 2-arm trials of ‘old’ tx vs. placebo, in order to get a range of effect sizes, and response rates (% who improve). Ideally, there is at least one 3-arm double- blind placebo-controlled comparison (‘old’ tx vs. new tx vs. placebo)

New Tx Beats Placebo Effect size vs. placebo is clinically significant. Mean difference on the primary outcome is statistically significant Response rate (% responders) is higher than placebo [how much higher is determined by prior studies]

New Tx not substantially worse than established tx New tx mean on primary outcome is closer to the est. tx mean than the placebo mean. New tx is not significantly different from the established tx. New tx responder rate is not much lower than that of the established tx. [should be just within the range seen in prior studies] Lower bound of the 95% confidence interval for primary outcome falls above ∆.

How to select ∆ It is lower than the range of outcome differences seen in prior to studies of established tx vs. placebo. The smallest value that could be considered a clinically meaningful effect (vs. placebo). The mean difference that corresponds to a x% difference in responder rates. [x is determined by prior studies of the established tx vs. placebo]

Other Criteria Dosing & duration of each tx are within the range of known efficacy. No confounds (despite randomization) Sample size provides adequate power to detect a clinically significant difference. Subjects have moderate disease severity. ‘Per protocol’ set of subjects may be best (most conservative).

Other Criteria Tx compliance should be similar in both groups. Low measurement error. If this is an interview or ratings, there is careful training & inter-rater reliability testing. If using a survey, the test is psychometrically sound. _________________________________________________________________________________________________________________________ These threats all create bias in favor of finding equivalence, unlike a superiority trail, where they bias the study against finding an effect.

Summary Evidence base adequate (for established tx). New Tx beats placebo. New Tx not substantially worse than established tx. Study design features do not bias the results toward equivalence.

Hypothetical Example studymean diff Mean placebo sx score Mean Est. Tx sx scoreplacebo sdES Placebo % responder Est tx % responder 11020106.01.70.250.4 21222105.82.10.20.48 31430166.02.30.150.49 4162596.42.50.220.6 avg1324.2511.256.052.1420.5%49.3% Mean Difference v. placebo Mean sx score ES vs. placebo ES vs. Est. tx Estimated New tx % responder 8161.3332.3% 9151.50-0.8035.8% 10141.67-0.6039.3% Previous studies Effect size x % responders Possible values for ∆ If ∆ = 9, the lower bound of the 95% CI for the new tx primary outcome score must be < 15 in order to claim non-inferiority. This is equivalent to a 36% responder rate and ES of 1.5 vs. placebo and -0.8 vs. established tx.

Hypothetical Results A) New tx mean = 16B) New tx mean = 17 New tx is not inferior to EST. TxNew tx is inferior to EST. tx

Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery.

Similar presentations

Presentation on theme: "Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery.

Similar presentations

Presentation on theme: "Power and Non-Inferiority Richard L. Amdur, Ph.D. Chief, Biostatistics & Data Management Core, DC VAMC Assistant Professor, Depts. of Psychiatry & Surgery."— Presentation transcript:

Similar presentations

About project

Feedback