# POPULATION RESEARCH SEMINAR SERIES Sponsored by the Statistics and Survey Methods Core of the U54 Partnership Power calculations: When and why are they.

## Presentation on theme: "POPULATION RESEARCH SEMINAR SERIES Sponsored by the Statistics and Survey Methods Core of the U54 Partnership Power calculations: When and why are they."— Presentation transcript:

POPULATION RESEARCH SEMINAR SERIES Sponsored by the Statistics and Survey Methods Core of the U54 Partnership Power calculations: When and why are they necessary? Wendy B. London, PhD Associate Professor of Pediatrics Harvard Medical School February 3, 2014

Topics Designing a valid study: how to collaborate with the statistician to calculate powerDesigning a valid study: how to collaborate with the statistician to calculate power Motivating example: improvement in level of health literacyMotivating example: improvement in level of health literacy How to deal with pitfalls: lower than expected enrollment; small than anticipated effect sizeHow to deal with pitfalls: lower than expected enrollment; small than anticipated effect size ·

Collaborating with a statistician Exchange of knowledge: teach the statistician your area of research; statistician will teach you statistical methodsExchange of knowledge: teach the statistician your area of research; statistician will teach you statistical methods http://www.youtube.com/watch?v=PbO DigCZqL8http://www.youtube.com/watch?v=PbO DigCZqL8http://www.youtube.com/watch?v=PbO DigCZqL8http://www.youtube.com/watch?v=PbO DigCZqL8

Focus on the primary objective Statistician will ask questions:Statistician will ask questions: What is your primary objective? (wording is important)What is your primary objective? (wording is important) The question you want to answerThe question you want to answer What is your primary endpoint?What is your primary endpoint? The thing you measure in order to answer the questionThe thing you measure in order to answer the question Statistician will encourage you to be focusedStatistician will encourage you to be focused The study’s power is driven by the primary objective.

Power calculations When are they necessary?When are they necessary? For primary objective of the studyFor primary objective of the study For peer review grantsFor peer review grants For situations of limited resourcesFor situations of limited resources For situations when the subject population is rare or difficult to enrollFor situations when the subject population is rare or difficult to enroll For publication of convincing resultsFor publication of convincing results When are they not necessary?When are they not necessary? Purely exploratory, descriptive studiesPurely exploratory, descriptive studies Pilot studiesPilot studies

Objective versus Endpoint Objective  the question you ask Driven by a hypothesis Aggregated across all patients/subjects Endpoint  the thing you measure (per patient) to answer the question One measure of this per patient/subject Example: Summary score from standardized instrument

Good endpoints: Are unambiguously defined –Unclear: literacy –Better: subject has low health literacy (<60 on TOFHLA): Yes/No Are quantifiable –Absence/presence (binary) –cm (continuous) –Time-to-event (survival) Are measured on each patient Measure the “effect” of interest Are appropriate within the context of the disease/biology/community setting Have available pilot data

Pass the endpoint test “Can I measure this in each subject?” “Is my measurement sufficiently reproducible?” –Too variable within a subject? –Too variable across laboratories? “Do I have pilot data about the variability?” (standard deviation or standard error) “Will I be able to obtain the data about this endpoint?” (feasibility) –Compliance with obtaining completed surveys –How often is the test result unable to be determined? “Is it clear which endpoint is the primary endpoint?” (ideally only one)

Summary about endpoints Objective: question Endpoint: what you measure to answer the question Explicitly define one primary endpoint in the study

Example: Study to improve health literacy (hypothetical) Primary objective: “To provide education in health literacy for patient and caregiver benefit”“To provide education in health literacy for patient and caregiver benefit” Better wording: “To increase the health literacy of patients and caregivers through a 6-month educational intervention program”“To increase the health literacy of patients and caregivers through a 6-month educational intervention program” Endpoint: Change from baseline in the Test of Functional Health Literacy in Adults (TOFHLA)Change from baseline in the Test of Functional Health Literacy in Adults (TOFHLA)

Come prepared to answer the Statistician’s questions: What is the variability of your endpoint?What is the variability of your endpoint? What effect size in your endpoint do you want to be able to detect?What effect size in your endpoint do you want to be able to detect? difference (between 2 groups) difference (between 2 groups) change from baseline (from pre-intervention to post-intervention) change from baseline (from pre-intervention to post-intervention) What is the smallest effect size that would still be meaningful?What is the smallest effect size that would still be meaningful? What is the largest effect size that would not be believable?What is the largest effect size that would not be believable?

Literacy Endpoint TOFHLA score: range of 0-100TOFHLA score: range of 0-100 ≤59 - inadequate functional health literacy≤59 - inadequate functional health literacy 60-74 - marginal functional health literacy 60-74 - marginal functional health literacy ≥75 - adequate functional health literacy ≥75 - adequate functional health literacy Change from baseline:Change from baseline: Investigator’s initial idea: +20Investigator’s initial idea: +20 Smallest meaningful effect size: +6Smallest meaningful effect size: +6 Largest believable effect size: +25Largest believable effect size: +25 Final proposal: look for change of +6 (or +8)Final proposal: look for change of +6 (or +8) Variability:Variability: pilot data on standard deviation (literature): SD=18 pilot data on standard deviation (literature): SD=18

Come prepared to answer the Statistician’s questions: What are your practical constraints?What are your practical constraints? Rare populationRare population Limited fundingLimited funding Deadline (must finish within x months)Deadline (must finish within x months) What is the largest sample size you would be willing/able to enroll?What is the largest sample size you would be willing/able to enroll? How long will it take you to enroll the subjects (enrollment rate)?How long will it take you to enroll the subjects (enrollment rate)?

Potential limitations on literacy study enrollment Population is not rarePopulation is not rare Enough funding for two classes of up to 100 subjects each. Maximum n=200Enough funding for two classes of up to 100 subjects each. Maximum n=200 Complete enrollment, education, and data collection within 1 year. Education and final data: 7 months. That leaves 5 months for enrollment duration.Complete enrollment, education, and data collection within 1 year. Education and final data: 7 months. That leaves 5 months for enrollment duration. Is enrollment rate of 40 subjects per month possible?Is enrollment rate of 40 subjects per month possible?

Come prepared to answer the Statistician’s questions: What are you eligibility criteria?What are you eligibility criteria? What proportion of the population meet these criteria?What proportion of the population meet these criteria? What proportion of those eligible will actually give consent to participate?What proportion of those eligible will actually give consent to participate? Overestimation of sample size leads to falsely inflating the power.

Literacy study enrollment rate One person to approach potential subjects in clinic on Tuesdays and ThursdaysOne person to approach potential subjects in clinic on Tuesdays and ThursdaysEstimates: 30 pts per day come to the clinic30 pts per day come to the clinic 15 pts per day meet the eligibility criteria15 pts per day meet the eligibility criteria 2-3 pts per day consent to the study2-3 pts per day consent to the study  20 pts/month enrollment rate  20 pts/month enrollment rate Reality: 100 pts (5 mos * 20 pts/month)

Hypothesis testing Restate the primary objective as a statistical hypothesis Let d = (final TOFHLA score) – (baseline TOFHLA score) Null hypothesis Ho: d = 0 Alternative hypothesisHa: d > 0 Power = the probability of rejecting the null if it is falsePower = the probability of rejecting the null if it is false Alpha = the probability of rejecting the null if it is trueAlpha = the probability of rejecting the null if it is true “underpowered” - study is too small to detect a meaningful difference

Power Assume that the “truth” is that the educational intervention will produce a 6 point improvement in the TOFHLA score. (i.e., the null is false)Assume that the “truth” is that the educational intervention will produce a 6 point improvement in the TOFHLA score. (i.e., the null is false) 80% power means:80% power means: If you run the same study 100 times, in at least 80 of them, you will correctly conclude that there has been an improvement in the TOFHLA. (Far more than 80, if the improvement is more than 6 points.)

Recall the p-value p-value: the probability of observing a result as or more extreme than we saw in our study if the null hypothesis is truep-value: the probability of observing a result as or more extreme than we saw in our study if the null hypothesis is true Small p-value: evidence that the null is not true (“significant result”)Small p-value: evidence that the null is not true (“significant result”) Large p-value: not sufficient evidence to reject the null (“not significant”)Large p-value: not sufficient evidence to reject the null (“not significant”) Threshold for significance? Typically we use p<0.05.Threshold for significance? Typically we use p<0.05.

P-value depends on the sample size Two separate studies observe the same TOFHLA score improvement. One study has larger sample size than the other; that study will have a smaller p-valueTwo separate studies observe the same TOFHLA score improvement. One study has larger sample size than the other; that study will have a smaller p-value Important point: a large p-value does not always mean that “the null is true”. It may mean that the sample size was not large enough to reject the null (“underpowered”)Important point: a large p-value does not always mean that “the null is true”. It may mean that the sample size was not large enough to reject the null (“underpowered”)

Literacy study power calculations Example: Ho: d=0 vs. Ha: d>0Example: Ho: d=0 vs. Ha: d>0 In a paired t-test, how much larger than 0 does d need to be in order to be meaningful and significant? Use d=6 and d=8 for power calculations. In a paired t-test, how much larger than 0 does d need to be in order to be meaningful and significant? Use d=6 and d=8 for power calculations. SD=18SD=18 alpha=0.05alpha=0.05 What sample size for 80% power? 90% power?What sample size for 80% power? 90% power? dStd deviation Poweralphan 61890.20.0583 61880.60.0561 81890.20.0548 81880.30.0535

Literacy study power calculations Example: Ho: d=0 vs. Ha: d>0Example: Ho: d=0 vs. Ha: d>0 In a paired t-test, how much larger than 0 does d need to be in order to be meaningful and significant? Use d=6 and d=8 for power calculations. In a paired t-test, how much larger than 0 does d need to be in order to be meaningful and significant? Use d=6 and d=8 for power calculations. SD=18SD=18 alpha=0.05alpha=0.05 What sample size for 80% power? 90% power?What sample size for 80% power? 90% power? dStd deviation Poweralphan 61890.20.0583 61880.60.0561 81890.20.0548 81880.30.0535

Study design for literacy study n=83n=83 Enrollment rate: 20 pts/monthEnrollment rate: 20 pts/month Enrollment duration: ~4 monthsEnrollment duration: ~4 months Alpha=0.05, power=90%Alpha=0.05, power=90% Paired t-test looks for average change from baseline of 6 or more points in TOFHLA scorePaired t-test looks for average change from baseline of 6 or more points in TOFHLA score

Pitfalls Overestimated enrollment rate Solution: investigator can extend enrollment durationOverestimated enrollment rate Solution: investigator can extend enrollment duration Underestimated the standard deviation Solution: investigator is willing to accept 80% power instead of 90%Underestimated the standard deviation Solution: investigator is willing to accept 80% power instead of 90% Observed TOFHLA improvement was smaller than 6. No solution – it is what it is. Not statistically significant; not meaningful.Observed TOFHLA improvement was smaller than 6. No solution – it is what it is. Not statistically significant; not meaningful.

Pick the right test for the right sample size

Large sample size  Tests work only if the data follow a normal distribution: paired t-test ANOVA chi-squared test

Pick the right test for the right sample size Large sample size  Tests work only if the data follow a normal distribution: paired t-test ANOVA chi-squared test Small sample size  Tests work even if the data don’t follow a normal distribution: Wilcoxon signed-rank test Fisher’s Exact test Simon’s two-stage design

Post-hoc power calculations No consensus on this in the literature Yes, it’s OK to retrospectively calculate powerYes, it’s OK to retrospectively calculate power if assumptions made during study design turn out to be untrueif assumptions made during study design turn out to be untrue if study enrollment stops before planned accrual goalif study enrollment stops before planned accrual goal No, there is no benefit to recalculating powerNo, there is no benefit to recalculating power Power is inherently prospectivePower is inherently prospective Once the study is completed, power calculations do not inform us in any way as to the conclusions of the present study (re-expression of p-value)Once the study is completed, power calculations do not inform us in any way as to the conclusions of the present study (re-expression of p-value)

Precison Precision is another approach to sample size justification. Precision is the way we quantify how accurate the observed endpoint is:Precision is the way we quantify how accurate the observed endpoint is: Width of a confidence intervalWidth of a confidence interval Larger sample size  narrower confidence intervalLarger sample size  narrower confidence interval

Recall the 95% confidence interval An interval that contains the true value of the parameter of interest 95% of the time.An interval that contains the true value of the parameter of interest 95% of the time. “We are 95% confident that the true proportion lies in this interval.”“We are 95% confident that the true proportion lies in this interval.” Example: below shows examples where observed proportion is 0.40. 95% confidence interval width depends on the sample sizeExample: below shows examples where observed proportion is 0.40. 95% confidence interval width depends on the sample size Depending on the sample size, we have greater or less precision in our estimateDepending on the sample size, we have greater or less precision in our estimate

Power Calculations Why are they necessary?Why are they necessary? For the integrity of the study: publication of convincing resultsFor the integrity of the study: publication of convincing results As a part of thoughtful study design: Power is the central elementAs a part of thoughtful study design: Power is the central element Because your grant won’t get past the peer-review statistician without themBecause your grant won’t get past the peer-review statistician without them

POPULATION RESEARCH SEMINAR SERIES Sponsored by the Statistics and Survey Methods Core of the U54 Partnership Questions? Comments? Type them in or ask over your webcam/microphone Or send an email to u54.ssmc@gmail.com

Download ppt "POPULATION RESEARCH SEMINAR SERIES Sponsored by the Statistics and Survey Methods Core of the U54 Partnership Power calculations: When and why are they."

Similar presentations