Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biostatistics in Practice Session 6: Data and Analyses: Too Little or Too Much Youngju Pak Biostatistician

Similar presentations


Presentation on theme: "Biostatistics in Practice Session 6: Data and Analyses: Too Little or Too Much Youngju Pak Biostatistician"— Presentation transcript:

1 Biostatistics in Practice Session 6: Data and Analyses: Too Little or Too Much Youngju Pak Biostatistician http://research.LABioMed.org/Biostat http://research.LABioMed.org/Biostat

2 Too Little Too Little Too few subjects: study not sufficiently powered (Session 4) Too few subjects: study not sufficiently powered (Session 4) A biasing characteristic not measured: attributability of effects questionable (Session 5) A biasing characteristic not measured: attributability of effects questionable (Session 5) Subjects do not complete study, or do not comply, e.g., take all doses (This session) Subjects do not complete study, or do not comply, e.g., take all doses (This session) “Too Much” “Too Much” All subjects, not a sample (This session) All subjects, not a sample (This session) Irrelevant detectability (This session) Irrelevant detectability (This session) Too Little or Too Much: Data

3 Too Few: Miss an Effect Too Few: Miss an Effect Too Many: Spurious Results Too Many: Spurious Results Numerous analyses due to: Numerous analyses due to: Multiple possible outcomes. Multiple possible outcomes. Ongoing analyses as more subjects accrue. Ongoing analyses as more subjects accrue. Many potential subgroups. Many potential subgroups. Too Little or Too Much: Analyses

4 Non-CompletingorNon-ComplyingSubjects

5 All Study Subjects or “Appropriate” Subset What is the most relevant group of studied subjects: all randomized, or mostly compliant, or completed study, or …?

6 Possible Bias Using Only Completers Possible Bias Using Only Completers Comparison: % cured, placebo vs. treated. Comparison: % cured, placebo vs. treated. Many more placebo subjects are not curing and go elsewhere; do not complete study. Many more placebo subjects are not curing and go elsewhere; do not complete study. Cure rate is biased upward in placebo completers Cure rate is biased upward in placebo completers  under-estimate the treatment effect  under-estimate the treatment effect If cure rate is biased upward in treatment completers  over-estimate the treatment effect If cure rate is biased upward in treatment completers  over-estimate the treatment effect

7 Study Goal: Study Goal: Scientific effect? Societal impact? Potential Biased Conclusions: Potential Biased Conclusions: Why not completed? Study arms equivalent? Criteria for Appropriate Subset Criteria for Appropriate Subset Primarily Compliance Primarily Dropout

8 Possible Study Populations Per-Protocol Subjects: Per-Protocol Subjects: Had all measurements, visits, doses, etc. Had all measurements, visits, doses, etc. “Modified”: relaxations, e.g., 85% of doses. “Modified”: relaxations, e.g., 85% of doses. Emphasis on scientific effect. Emphasis on scientific effect. Intention-to-Treat Subjects: Intention-to-Treat Subjects: Everyone who was randomized. Everyone who was randomized. “Modified”: slight relaxations, e.g., ≥ 1 dose. “Modified”: slight relaxations, e.g., ≥ 1 dose. Emphasis on non-biased policy conclusion. Emphasis on non-biased policy conclusion.

9 Intention-to-Treat (ITT) Intention-to-Treat (ITT) ITT specifies the population; it includes non- completers. ITT specifies the population; it includes non- completers. Still need to define outcomes for non- completers, i.e., “impute” values. Still need to define outcomes for non- completers, i.e., “impute” values. Typical to define non-completers as not cured. Typical to define non-completers as not cured.

10 ITT: Two Ways to Impute Unknown Values Change from Baseline Change from Baseline Baseline Baseline Final Visit Final Visit Intermediate Visit Intermediate Visit 0 Change from Baseline Change from Baseline Intermediate Visit Intermediate Visit Final Visit Final Visit Baseline Baseline 0 LOCF: LOCF: Ignore Presumed Progression Ignore Presumed Progression LRCF: LRCF: Maintain Expected Relative Progression Maintain Expected Relative Progression Individual Subjects Individual Subjects Ranks Ranks Observations Observations

11 “Too Much” Data “Too Much” Data

12 All Possible Data, No Sample All Possible Data, No Sample “Too much” data to need probabilistic statements; already have the whole truth. “Too much” data to need probabilistic statements; already have the whole truth. Not always as obvious as it sounds. Not always as obvious as it sounds. Examples: Electric Medical Records(EMR), some chart reviews; site-specific, not samples. Examples: Electric Medical Records(EMR), some chart reviews; site-specific, not samples. Confidence intervals usually irrelevant. Confidence intervals usually irrelevant. Reference ranges, some non - generalizable comparisons may be valid. Reference ranges, some non - generalizable comparisons may be valid.

13 Irrelevant (?) Detectability with Large Study Irrelevant (?) Detectability with Large Study Significant differences (p<0.05) in %s between placebo and treatment groups: Significant differences (p<0.05) in %s between placebo and treatment groups: N/Group Difference #Treated* to Cure 1 N/Group Difference #Treated* to Cure 1 100 50% vs. 63.7% 7 100 50% vs. 63.7% 7 1000 50% vs. 54.4% 23 1000 50% vs. 54.4% 23 5000 50% vs. 52.0% 50 5000 50% vs. 52.0% 50 10000 50% vs. 51.4% 71 10000 50% vs. 51.4% 71 50000 50% vs. 50.6% 167 50000 50% vs. 50.6% 167 *NNT = Number Needed to Treat = 100/Δ *NNT = Number Needed to Treat = 100/Δ

14 Too Little or Too Much: Analyses

15 Too Little or Too Much: Analyses Too Little or Too Much: Analyses Multiple: Multiple: Outcomes Outcomes Subgroups Subgroups Ongoing effects Ongoing effects Exploring vs. Proving Exploring vs. Proving

16 Balance Between Missing an Effect and Spurious Results Balance Between Missing an Effect and Spurious Results Food Additives and Hyperactivity Study: Food Additives and Hyperactivity Study: Uses composite score. Uses composite score. Many other indicators of hyperactivity. Many other indicators of hyperactivity. Multiple Outcomes

17 GHA: Global Hyperactivity Aggregate GHA: Global Hyperactivity Aggregate Teacher ADHD Teacher ADHD Parent ADHD Parent ADHD Class ADHD Class ADHD Conner Conner … … … … 10 Items 10 Items 12 Items 12 Items 4 Items 4 Items Could perform: 10 + 10 + 12 + 4 = 36 item analyses. Could perform: 10 + 10 + 12 + 4 = 36 item analyses.

18 pp. 1667-69 pp. 1667-69 Editorial: Editorial: Multiple Subgroup Analyses: Example Multiple Subgroup Analyses: Example

19 Comparing Two Treatments in 25 Subgroups + Overall Comparing Two Treatments in 25 Subgroups + Overall Multiple Subgroup Analyses: Example Multiple Subgroup Analyses: Example

20

21 Multiple Subgroup Analyses Lagakos NEJM 354(16):1667-1669. Lagakos NEJM 354(16):1667-1669. False Positive Conclusions False Positive Conclusions 72% chance of claiming at least one false effect with 25 comparisons 72% chance of claiming at least one false effect with 25 comparisons

22 A Correction for Multiple Analyses No Correction: No Correction: If using p<0.05, then P[ true negative] = 0.95. If using p<0.05, then P[ true negative] = 0.95. If 25 comparisons are independent, If 25 comparisons are independent, P[all true negative] = (1-0.05) 25 = (0.95) 25 = 0.28. P[all true negative] = (1-0.05) 25 = (0.95) 25 = 0.28. So, P[at least 1 false pos] = 1 - 0.28 = 0.72. So, P[at least 1 false pos] = 1 - 0.28 = 0.72. Bonferroni Correction: Bonferroni Correction: To maintain P[true negative in k tests] = 0.95 = (1-p*) k, need to use p* = 1 - (0.95) 1/k ≈ 0.05/k To maintain P[true negative in k tests] = 0.95 = (1-p*) k, need to use p* = 1 - (0.95) 1/k ≈ 0.05/k So, use p<0.05/k to maintain <5% overall false positive rate(type I error rate). So, use p<0.05/k to maintain <5% overall false positive rate(type I error rate).

23 Some formal corrections “built-in” to p-values: Some formal corrections “built-in” to p-values: Bonferroni: general purpose Bonferroni: general purpose Tukey: for pairs of group means, >2 groups Tukey: for pairs of group means, >2 groups Many statistical software will compute adjusted p-values due to the multiple tests using these methods Many statistical software will compute adjusted p-values due to the multiple tests using these methods Accounting for Multiple Analyses Formal corrections may not be necessary: Formal corrections may not be necessary: Transparency of what was done is most important. Transparency of what was done is most important. Should be aware yourself of number of analyses and report it with any conclusions. Should be aware yourself of number of analyses and report it with any conclusions.

24 Cohan, Crit Care Med 33(10):2358-2366. Cohan, Crit Care Med 33(10):2358-2366. Reporting Multiple Analyses Clopidogrel paper 4 slides back: Clopidogrel paper 4 slides back: No p-values or probabilistic conclusions for 25 subgroups, and: No p-values or probabilistic conclusions for 25 subgroups, and: Another paper’s transparency: Another paper’s transparency:

25 Multiple Mid-Study Analyses Should effects be monitored as more and more subjects complete? Should effects be monitored as more and more subjects complete? Some mid-study analyses: Some mid-study analyses: Interim analyses Interim analyses Study size re-evaluation Study size re-evaluation Feasibility analyses Feasibility analyses

26 Mid-Study Analyses Effect Effect 0 Number of Subjects Enrolled Number of Subjects Enrolled Time → Time → Too many analyses Too many analyses Wrong early conclusion Wrong early conclusion Need to monitor, but also account for many analyses Need to monitor, but also account for many analyses

27 Mid-Study Analyses  Mid-study comparisons should not be made before study completion unless planned for (interim analyses). Early comparisons are unstable, and can invalidate final comparisons.  Interim analyses are planned comparisons at specific times, usually by an unmasked advisory board. They allow stopping the study early due to very dramatic effects, and final comparisons, if study continues, are adjusted to validly account for “peeking”. Continued … Continued …

28 Mid-Study Analyses  Mid-study reassessment of study size is advised for long studies. Only standard deviations to date, not effects themselves, are used to assess original design assumptions.  Feasibility analysis: may use the assessment noted above to decide whether to continue the study. may use the assessment noted above to decide whether to continue the study. may measure effects, like interim analyses, by unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study. may measure effects, like interim analyses, by unmasked advisors, to project ahead on the likelihood of finding effects at the planned end of study. Continued … Continued …

29 Mid-Study Analyses Study 1: Groups do not differ; plan to add more subjects. Study 1: Groups do not differ; plan to add more subjects. Consequence → final p-value not valid; probability requires no prior knowledge of effect. Consequence → final p-value not valid; probability requires no prior knowledge of effect. Study 2: Groups differ significantly; plan to stop study. Study 2: Groups differ significantly; plan to stop study. Consequence → use of this p-value not valid; the probability requires incorporating later comparison. Consequence → use of this p-value not valid; the probability requires incorporating later comparison. Examples: Studies at Harbor Examples: Studies at Harbor Randomized; not masked; data available to PI. Randomized; not masked; data available to PI. Compared treatment groups repeatedly, as more subjects were enrolled. Compared treatment groups repeatedly, as more subjects were enrolled.

30 Bad Science That Seems So Good 1. Re-examining data, or using many outcomes, seeming to be due diligence. 2. Adding subjects to a study that is showing marginal effects; stopping early due to strong results. 3. Looking for effects in many subgroups. Actually bad? Could be negligent NOT to do these, but need to account for doing them. Actually bad? Could be negligent NOT to do these, but need to account for doing them.

31 How to avoid the misled result  Analyses should be planned before the data are collected (how many dependent and independent variables are to be collected, what hypotheses to be tested.  All planned analyses should be completed and reported.

32 1. Study designs 2. Descriptive vs. Inferential statistics 3. Hypothesis testing and a p-value 4. Five elements to determine a sample size 5. Covariates and multivarite regression models 6. Bonferroni’s correction We have learned..

33 EPILOGUE GIVE A BIG CLAP TO YOURSELF SINCE YOU ‘VE MADE THIS FAR ! SINCE YOU ‘VE MADE THIS FAR ! CONGRATULATION !!! 33


Download ppt "Biostatistics in Practice Session 6: Data and Analyses: Too Little or Too Much Youngju Pak Biostatistician"

Similar presentations


Ads by Google