Presentation is loading. Please wait.

Presentation is loading. Please wait.

© Scott Evans, Ph.D., and Lynne Peoples, M.S.

Similar presentations


Presentation on theme: "© Scott Evans, Ph.D., and Lynne Peoples, M.S."— Presentation transcript:

1 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Sample Size and Power © Scott Evans, Ph.D., and Lynne Peoples, M.S.

2 Sample Size Considerations
A pharmaceutical company calls and says, “We believe we have found a cure for the common cold. How many patients do I need to study to get our product approved by the FDA?” Most common statistical question: “How many patients do we need?” © Scott Evans, Ph.D., and Lynne Peoples, M.S.

3 Where to begin? N = (Total Budget / Cost per patient)? Hopefully not!
© Scott Evans, Ph.D., and Lynne Peoples, M.S.

4 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Does Size Matter? Too few Cannot definitively answer the research question Potentially unethical Too many Wasteful of resources Exposes more people than necessary to potentially harmful treatments May identify treatment effects that are irrelevant and possibly create confusion © Scott Evans, Ph.D., and Lynne Peoples, M.S.

5 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Where to begin? Understand the research question Learn about the application and the problem Learn about the disease and the medicine “What’s the question?” The following are NOT research questions: We want to “look at” CD4 count We want to analyze the data We want to see if our results are significant Crystal Ball Visualize the final analysis and the statistical methods to be used © Scott Evans, Ph.D., and Lynne Peoples, M.S.

6 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Where to begin? Analysis determines sample size Sample size calculations are based upon the planned method of analysis If you don’t know how the data will be analyzed (e.g., 2-sample t-test), then you cannot accurately estimate the sample size © Scott Evans, Ph.D., and Lynne Peoples, M.S.

7 Sample Size Calculation
Formulate a PRIMARY research question Identify: A hypothesis to test (write down H0 and HA), or A quantity to estimate (e.g., using confidence intervals) © Scott Evans, Ph.D., and Lynne Peoples, M.S.

8 Sample Size Calculation
Determine the endpoint or outcome measure associated with the hypothesis test or quantity to be estimated How do we “measure” or “quantify” the responses? Is the measure continuous, binary, or a time-to-event? Is this a one-sample or two-sample problem? © Scott Evans, Ph.D., and Lynne Peoples, M.S.

9 Sample Size Calculation
Based upon the PRIMARY outcome Other analyses (i.e., secondary outcomes) may be planned, but the study may not be powered to detect effects for these outcomes © Scott Evans, Ph.D., and Lynne Peoples, M.S.

10 Sample Size Calculation
Two strategies Hypothesis Testing E.g., H0: μ1 = μ2 vs. HA: μ1 ≠ μ2 H0: μ = μ0 vs. HA: μ ≠ μ0 H0: p1 = p2 vs. HA: p1 ≠ p2 H0: p = p0 vs. HA: p ≠ p0 Estimation with Precision Based on width of a confidence interval © Scott Evans, Ph.D., and Lynne Peoples, M.S.

11 Sample Size Calculation Using Hypothesis Testing
The most common approach The idea is to choose a sample size such that both of the following conditions simultaneously hold: If the null hypothesis is true, then the probability of incorrectly rejecting is (no more than) α If the alternative hypothesis is true, then the probability of correctly rejecting is (at least) 1-β = power © Scott Evans, Ph.D., and Lynne Peoples, M.S.

12 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Reality Ho True Ho False Test Result Reject Ho Type I error (α) Power (1-β) Do not reject Ho 1-α Type II error (β) © Scott Evans, Ph.D., and Lynne Peoples, M.S.

13 Determinants of Sample Size: Hypothesis Testing Approach
α β An “effect size” to detect Minimum difference that is clinically relevant (for superiority) E.g., H0: p1 - p2 = 0 vs. HA: p1 - p2 = 0.20 Maximum difference that is clinically irrelevant (for noninferiority) Estimates of variability © Scott Evans, Ph.D., and Lynne Peoples, M.S.

14 What is Needed to Determine the Sample-Size?
α Up to the investigator Regulated by FDA for phase III pivotal trials (0.05) How much type I (false positive) error can you afford? © Scott Evans, Ph.D., and Lynne Peoples, M.S.

15 What is Needed to Determine the Sample-Size?
1-β (power) Up to the investigator (often 80%-90%) Not regulated by FDA How much type II (false negative) error can you afford? © Scott Evans, Ph.D., and Lynne Peoples, M.S.

16 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Choosing α and β Weigh the cost of a Type I error versus a Type II error In early phase clinical trials, we often do not want to “miss” a significant result and thus often consider designing a study for higher power (perhaps 90%) and may consider relaxing the α error (perhaps 0.10) In order to approve a new drug, the FDA requires significance in two Phase III trials strictly designed with α error no greater than 0.05 (power = 1-β is often set to 80% - 90%) © Scott Evans, Ph.D., and Lynne Peoples, M.S.

17 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Effect Size The “minimum difference (between groups) that is clinically relevant or meaningful” Defines HA E.g., H0: p1 - p2 = 0 vs. HA: p1 - p2 = 0.20 Not readily apparent Requires clinical input Often difficult to agree upon © Scott Evans, Ph.D., and Lynne Peoples, M.S.

18 Estimates of Variability
Often obtained from prior studies (historical data) Explore the literature and data from ongoing studies for estimates needed in calculations Consider conducting a pilot study to estimate this May need to validate this estimate later © Scott Evans, Ph.D., and Lynne Peoples, M.S.

19 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Considerations Scale of endpoint Continuous vs. binary vs. time-to-event 1-sample vs. 2-sample Independent samples or paired 1-sided vs. 2-sided © Scott Evans, Ph.D., and Lynne Peoples, M.S.

20 Example: Cluster Headaches
A experimental drug is being compared with placebo for the treatment of cluster headaches The design of the study is to randomize an equal number of participants to the new drug and placebo The participants will be administered the drug or matching placebo One hour later, the participants will score their pain using the visual analog scale (VAS) for pain A continuous measure ranging from 0 (no pain) to 10 (severe pain) © Scott Evans, Ph.D., and Lynne Peoples, M.S.

21 Example: Cluster Headaches
Envision the analyses The planned analysis is a 2-sample t-test (independent groups) comparing the mean VAS score between groups, one hour after drug (or placebo) initiation © Scott Evans, Ph.D., and Lynne Peoples, M.S.

22 Example: Cluster Headaches
It is desirable to detect differences as small as 2 units (on the VAS scale) Thus the hypothesis to be tested is H0: μ1-μ2 = 0 vs. HA: μ1-μ2=2 Using α=0.05 and 1-β=0.80, and an assumed standard deviation (SD) of responses of 4 (in both groups), 63 participants per group (126 total) are required STATA Command: sampsi 0 2, sd(4) a(0.05) p(.80) Note: you just need a difference of 2 in the first two numbers © Scott Evans, Ph.D., and Lynne Peoples, M.S.

23 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Example: Part 2 Let’s say that instead of measuring pain on a continuous scale using the VAS, we simply measured “response” (i.e., the headache is gone) vs. non-response © Scott Evans, Ph.D., and Lynne Peoples, M.S.

24 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Example: Part 2 Envision the analyses The planned analysis is a 2-sample test (independent groups) comparing the proportion of responders, one hour after drug (or placebo) initiation © Scott Evans, Ph.D., and Lynne Peoples, M.S.

25 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Example: Part 2 It is desirable to detect a difference in response rates of 25% and 50% Thus the H0: p1-p2=0 vs. HA: p1-p2=0.25 when p1=0.50 Using α=0.05 and 1-β=0.80 STATA Command: sampsi , a(0.05) p(.80) 66 per group (132 total) w/ continuity correction 58 per group (116 total) without continuity correction © Scott Evans, Ph.D., and Lynne Peoples, M.S.

26 Notes for Testing Proportions
One does not need to specify a variability since it is determined from the proportion The required sample size for detecting a difference between 0.25 and 0.50 is different from the required sample size for detecting a difference between 0.70 and 0.95 (even though both are 0.25 differences) because the variability is different This is not the case for means © Scott Evans, Ph.D., and Lynne Peoples, M.S.

27 Caution for Testing Proportions
Some software computes the sample size for testing the null hypothesis of the equality of two proportions using a “continuity correction” while others calculate sample size without this correction Use it if available Answers will differ slightly, although either method is acceptable STATA uses a continuity correction The website does not © Scott Evans, Ph.D., and Lynne Peoples, M.S.

28 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
One Sample Problems One sample test of means One sample test of proportions © Scott Evans, Ph.D., and Lynne Peoples, M.S.

29 Sample Size Calculation Using Estimation with Precision
Not nearly as common, but equally as valid The idea is to estimate a parameter with enough “precision” to be meaningful E.g., the width of a confidence interval is narrow enough © Scott Evans, Ph.D., and Lynne Peoples, M.S.

30 Determinants of Sample Size: Estimation Approach
α Estimates of variability Precision E.g., The (maximum) desired width of a confidence interval © Scott Evans, Ph.D., and Lynne Peoples, M.S.

31 Example: Evaluating a Diagnostic Examination
It is desirable to estimate the sensitivity of an examination by trained site nurses relative to an oral medicine specialist for the diagnosis of Oral Candidiasis (OC) in HIV-infected people Precision: It is desirable to estimate the sensitivity such that the width of a 95% confidence interval is 15% © Scott Evans, Ph.D., and Lynne Peoples, M.S.

32 Example: Evaluating a Diagnostic Examination
Note: sensitivity is a proportion The (large sample) CI for a proportion is: © Scott Evans, Ph.D., and Lynne Peoples, M.S.

33 Example: Evaluating a Diagnostic Examination
We wish the width of the CI to be <0.15 Using an estimated proportion of 0.25 and α=0.05, we can calculate n=129 Since sensitivity is a conditional probability, we need 129 that are OC+ as diagnosed by the oral health specialist. If the prevalence of OC is ~20%, then we would need to enroll or screen ~129/(0.20)=645 © Scott Evans, Ph.D., and Lynne Peoples, M.S.

34 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Sensitivity Analyses Sample size calculations require assumptions and estimates E.g., estimates of variability It is prudent to investigate how sensitive the sample size estimates are to changes in these assumptions (as they may be inaccurate) Thus, provide numbers for a range of scenarios and various combinations of parameters (e.g., for various values combinations of α, β, estimates of variance, effect sizes, etc.) © Scott Evans, Ph.D., and Lynne Peoples, M.S.

35 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Example: (Per Group) Sample Size Sensitivity Analyses for the Study of Cluster Headaches μ1 μ2 SD Power=80% Power=90% 2 3.5 49 65 4.0 63 85 4.5 80 107 3 22 29 28 38 36 48 © Scott Evans, Ph.D., and Lynne Peoples, M.S.

36 Effects of Determinants
In general, the following increases the required sample size (with all else being equal): Lower α Lower β Higher variability Smaller effect size to detect More precision required (i.e., narrower interval) © Scott Evans, Ph.D., and Lynne Peoples, M.S.

37 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
Caution In general, higher sample size implies higher power Does this mean that a higher sample size is always better? Not necessarily Studies can be very costly It is wasteful to power studies to detect between-group differences that are clinically irrelevant. © Scott Evans, Ph.D., and Lynne Peoples, M.S.

38 Sample Size Adjustments
Complications (e.g., loss-to-follow-up, poor adherence, etc.) during clinical trials can impact study power This may be less of a factor in lab experiments Expect these complications and plan for them BEFORE the study begins Adjust the sample size estimates to account for these complications © Scott Evans, Ph.D., and Lynne Peoples, M.S.

39 Complications that Decrease Power
Missing data Poor Adherence Multiple tests Unequal group sizes Use of nonparametric testing (vs. parametric) Noninferiority or equivalence trials (vs. superiority trials) Inadvertent enrollment of ineligible subjects or subjects that cannot respond © Scott Evans, Ph.D., and Lynne Peoples, M.S.

40 Adjustment for Lost-to-Follow-up
Loss-to-Follow-Up (LFU) refers to when a participants endpoint status is not available (missing data) If one assumes that the LFU is non-informative or ignorable (i.e., random and not related to treatment), then a simple sample size adjustment can be made This assumption is: Very strong as LFU is often associated with treatment Difficult to validate Researchers need to consider the potential bias of examining only subjects with non-missing data © Scott Evans, Ph.D., and Lynne Peoples, M.S.

41 Adjustment for Lost-to-Follow-up
Calculate the sample size N, assuming no loss-to-follow-up (LFU) Let x=proportion expected to be lost-to-follow-up Nadj=N/(1-x) Note: This adjustment applies to a “per protocol” or “as treated” or “observed data” analyses No LFU adjustment is necessary if you plan to impute missing values (e.g., ITT analyses) However, if you use imputation, an adjustment for a “dilution effect” may be warranted © Scott Evans, Ph.D., and Lynne Peoples, M.S.

42 Inflation Factor for LFU
Proportion LFU 0.05 0.10 0.20 0.30 0.50 Inflation Factor 1.05 1.11 1.25 1.43 2.00 © Scott Evans, Ph.D., and Lynne Peoples, M.S.

43 Adjustment for Poor Adherence
Adjustment for the “dilution effect” due to poor adherence or the inclusion (perhaps inadvertently) of subjects that cannot respond: Calculate the sample size N Let x=proportion expected to be non-adherent Nadj=N/(1-x)2 © Scott Evans, Ph.D., and Lynne Peoples, M.S.

44 Inflation Factor for Non-adherence
Proportion non-Adherent 0.05 0.10 0.20 0.30 0.50 Inflation Factor 1.11 1.23 1.56 2.04 4.00 © Scott Evans, Ph.D., and Lynne Peoples, M.S.

45 Adjustment for Unequal Allocation
When comparing groups, power is maximized when groups sizes are equal (with all else being equal) There may be other reasons however, to have some group sizes larger than others E.g., having more people on an experimental therapy (rather than placebo) to obtain more safety information of the product © Scott Evans, Ph.D., and Lynne Peoples, M.S.

46 Adjustment for Unequal Allocation
Adjustment for unequal allocation in two groups: Let QE and QC be the sample fractions such that QE+QC=1 Note power is optimized when QE=QC=0.5 Calculate sample size Nbal for equal sample sizes (i.e., QE=QC=0.5) Nunbal=Nbal ((QE-1 +QC-1)/4) Some software can calculate this directly STATA: ratio option with the sampsi command © Scott Evans, Ph.D., and Lynne Peoples, M.S.

47 Adjustment for Nonparametric Testing
Most sample-size calculations are performed expecting use of parametric methods (e.g., t-test) This is often done because formulas (and software) for these methods are readily available However, parametric assumptions (e.g., normality) do not always hold Thus nonparametric methods may be required It is unknown at the design stage whether parametric methods can be performed © Scott Evans, Ph.D., and Lynne Peoples, M.S.

48 Adjustment for Nonparametric Testing
Pitman Efficiency Applicable for 1 and 2 sample t-tests Method Calculate sample size Npar Nnonpar = Npar /(0.864) © Scott Evans, Ph.D., and Lynne Peoples, M.S.

49 Example: Cluster Headaches
Recall the cluster headache example in which the required sample size was 126 (total) for detecting a 2 unit (VAS scale) difference in means If we expect 10% of the participants to be non-adherent then an appropriate inflation is needed 126/(1-0.1)2=156 If we further expect that we will have to perform a nonparametric test (instead of a t-test) due to non-normality, then further inflation is required: 156/(0.864)=181 Round to 182 to have an equal number (81) in each group © Scott Evans, Ph.D., and Lynne Peoples, M.S.

50 Adjustment: Noninferiority/Equivalence Studies
Calculate sample size for standard superiority trial but reverse the roles of α and β Works for large sample binary and continuous data Does not work for time-to-event data © Scott Evans, Ph.D., and Lynne Peoples, M.S.

51 © Scott Evans, Ph.D., and Lynne Peoples, M.S.
More Adjustments? Adjustments are needed if: You plan interim analyses where you will test hypotheses Group sequential designs You have more than one primary test to be conducted Multiple comparison adjustments E.g., Bonferroni (if 2 tests or comparisons are to be made, then power each at α/2 © Scott Evans, Ph.D., and Lynne Peoples, M.S.

52 Sample Size Re-estimation
Hot Topic in clinical trials Re-estimating sample size based on interim data Complicated Must be done carefully to maintain scientific integrity and blinding © Scott Evans, Ph.D., and Lynne Peoples, M.S.

53 Sample Size Re-estimation
If based on new estimate of variation only (not effect size) then generally okay Recommended to be done blinded and based on pooled data If based on effect size estimates or unblinded data, then can be problematic © Scott Evans, Ph.D., and Lynne Peoples, M.S.

54 Sample Size Re-estimation
Might be okay if done independently of endpoint data Example OHARA is developing a study to estimate the sensitivity (and specificity and predictive values) of a new screening instrument The analysis will be to construct a 95% CI for the sensitivity with a specified precision Requires 300 patients with disease Enrollment will thus be 300/p where p=estimated prevalence of disease Since the prevalence estimate is just a guess, we can assess this assumption after say 200 patients have been accrued, and then re-estimate sample size using this new estimate of the prevalence This is not problematic because we are not looking at endpoint data (if the screening result is correct) to re-estimate sample size We are performing the re-estimation based on a nuisance parameter The re-estimation is non associated with the study result (i.e., sensitivity of the instrument) © Scott Evans, Ph.D., and Lynne Peoples, M.S.


Download ppt "© Scott Evans, Ph.D., and Lynne Peoples, M.S."

Similar presentations


Ads by Google