Power and Sample Size Part II Elizabeth Garrett-Mayer, PhD Assistant Professor of Oncology & Biostatistics.

Slides:



Advertisements
Similar presentations
Study Size Planning for Observational Comparative Effectiveness Research Prepared for: Agency for Healthcare Research and Quality (AHRQ)
Advertisements

Comparing Two Proportions (p1 vs. p2)
Choosing Endpoints and Sample size considerations
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Hypothesis Testing making decisions using sample data.
November 10, 2010DSTS meeting, Copenhagen1 Power and sample size calculations Michael Væth, University of Aarhus Introductory remarks Two-sample problem.
Estimation of Sample Size
Tim Wiemken, PhD MPH CIC Assistant Professor Division of Infectious Diseases University of Louisville, Kentucky Planning Your Study Statistical Issues.
Copyright ©2011 Brooks/Cole, Cengage Learning Testing Hypotheses about Means Chapter 13.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
© Scott Evans, Ph.D., and Lynne Peoples, M.S.
Sample size computations Petter Mostad
Research Curriculum Session III – Estimating Sample Size and Power Jim Quinn MD MS Research Director, Division of Emergency Medicine Stanford University.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Sample Size and Power.
BCOR 1020 Business Statistics
Chapter 9 Hypothesis Testing II. Chapter Outline  Introduction  Hypothesis Testing with Sample Means (Large Samples)  Hypothesis Testing with Sample.
Sample Size Determination
Sample size and study design
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sample Size Determination Ziad Taib March 7, 2014.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
1. Statistics: Learning from Samples about Populations Inference 1: Confidence Intervals What does the 95% CI really mean? Inference 2: Hypothesis Tests.
AM Recitation 2/10/11.
Copyright © 2012 by Nelson Education Limited. Chapter 8 Hypothesis Testing II: The Two-Sample Case 8-1.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Introduction to Biostatistics and Bioinformatics
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Testing Hypotheses Tuesday, October 28. Objectives: Understand the logic of hypothesis testing and following related concepts Sidedness of a test (left-,
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
More About Significance Tests
Confidence Intervals Elizabeth S. Garrett Oncology Biostatistics March 27, 2002 Clinical Trials in 20 Hours.
CI - 1 Cure Rate Models and Adjuvant Trial Design for ECOG Melanoma Studies in the Past, Present, and Future Joseph Ibrahim, PhD Harvard School of Public.
RMTD 404 Lecture 8. 2 Power Recall what you learned about statistical errors in Chapter 4: Type I Error: Finding a difference when there is no true difference.
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Statistical Power and Sample Size Calculations Drug Development Statistics & Data Management July 2014 Cathryn Lewis Professor of Genetic Epidemiology.
Understanding the Variability of Your Data: Dependent Variable Two "Sources" of Variability in DV (Response Variable) –Independent (Predictor/Explanatory)
Chapter 9 Hypothesis Testing II: two samples Test of significance for sample means (large samples) The difference between “statistical significance” and.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
Inference for distributions: - Comparing two means IPS chapter 7.2 © 2006 W.H. Freeman and Company.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
Chapter 20 Testing Hypothesis about proportions
Medical Statistics as a science
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Issues concerning the interpretation of statistical significance tests.
Power & Sample Size Dr. Andrea Benedetti. Plan  Review of hypothesis testing  Power and sample size Basic concepts Formulae for common study designs.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Sample Size Determination
Chapter 13 Understanding research results: statistical inference.
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Hypothesis Tests u Structure of hypothesis tests 1. choose the appropriate test »based on: data characteristics, study objectives »parametric or nonparametric.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Sample Size and Power Considerations.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Clinical Trials 2015 Practical Session 2. Exercise I Normally Distributed Response Data H 0 :  = 0 H a :  =  a > 0 n=? α=0.05 (two-sided) β=0.20 
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Sample Size Considerations
Sample Size Determination
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Professor Dept. of Family & Community Medicine College.
Type I and Type II Errors
Presentation transcript:

Power and Sample Size Part II Elizabeth Garrett-Mayer, PhD Assistant Professor of Oncology & Biostatistics

Sample Size and Power The most common reason statisticians get contacted Sample size is contingent on design, analysis plan, and outcome With the wrong sample size, you will either –Not be able to make conclusions because the study is “underpowered” –Waste time and money because your study is larger than it needed to be to answer the question of interest And, with wrong sample size, you might have problems interpreting your result: –Did I not find a significant result because the treatment does not work, or because my sample size is too small? –Did the treatment REALLY work, or is the effect I saw too small to warrant further consideration of this treatment? –This is an issue of CLINICAL versus STATISTICAL significance

Sample Size and Power Sample size ALWAYS requires the investigator to make some assumptions –How much better do you expect the experimental therapy group to perform than the standard therapy groups? –How much variability do we expect in measurements? –What would be a clinically relevant improvement? The statistician CANNOT tell you what these numbers should be (unless you provide data) It is the responsibility of the clinical investigator to define these parameters

Sample Size and Power Review of power oPower = The probability of concluding that the new treatment is effective if it truly is effective oType I error = The probability of concluding that the new treatment is effective if it truly is NOT effective o(Type I error = alpha level of the test) o(Type II error = 1 – power) When your study is too small, it is hard to conclude that your treatment is effective

Three common settings Binary outcome: e.g., response vs. no response, disease vs. no disease Continuous outcome: e.g., number of units of blood transfused, CD4 cell counts Time-to-event outcome: e.g., time to progression, time to death.

Continuous outcomes Easiest to discuss Sample size depends on –Δ: difference under the null hypothesis –α: type 1 error –β type 2 error –σ: standard deviation –r: ratio of number of patients in the two groups (usually r = 1)

Continuous Outcomes We usually –find sample size OR –find power OR –find Δ But for Phase III cancer trials, most typical to solve for N.

Example: sample size in EACA study in spine surgery patients The primary goal of this study is to determine whether epsilon aminocaproic acid (EACA) is an effective strategy to reduce the morbidity and costs associated with allogeneic blood transfusion in adult patients undergoing spine surgery. (Berenholtz) Comparative study with EACA arm and placebo arm. Primary endpoint: number of allogeneic blood units transfused per patient through day 8 post-operation. Average number of units transfused without EACA is expected to be 7 Investigators would be interested in regularly using EACA if it could reduce the number of units transfused by 30% (to 5 units or less).

Example: sample size in EACA study in spine surgery patients H 0 : μ 1 – μ 2 = 0 H 1 : μ 1 – μ 2 ≠ 0 We want to know what sample size we need to have large power and small type I error. –If the treatment DOES work, then we want to have a high probability of concluding that H 1 is “true.” –If the treatment DOES NOT work, then we want a low probability of concluding that H 1 is “true.”

Two-sample t-test approach Assume that the standard deviation of units transfused is 4 Assume that difference we are interested in detecting is μ 1 – μ 2 = 2 Assume that N is large enough for CLT Choose two-sided alpha of 0.05

Two-sample t-test approach

For testing the difference in two means, with equal allocation to each arm: With UNequal allocation to each arm, where n 2 = rn 1

Sample size = 30,Power = 26% Sampling distn under H 1 : μ 1 - μ 2 = 2Sampling distn under H 1 : μ 1 - μ 2 = 0 μ 1 - μ 2 Vertical line defines rejection region

Sample size = 60,Power = 48% μ 1 - μ 2 Vertical line defines rejection region Sampling distn under H 1 : μ 1 - μ 2 = 0Sampling distn under H 1 : μ 1 - μ 2 = 2

Sample size = 120,Power = 78% μ 1 - μ 2 Vertical line defines rejection region Sampling distn under H 1 : μ 1 - μ 2 = 0Sampling distn under H 1 : μ 1 - μ 2 = 2

Sample size = 240, Power = 97% μ 1 - μ 2 Vertical line defines rejection region Sampling distn under H 1 : μ 1 - μ 2 = 0Sampling distn under H 1 : μ 1 - μ 2 = 2

Sample size = 400, Power > 99% μ 1 - μ 2 Vertical line defines rejection region Sampling distn under H 1 : μ 1 - μ 2 = 0Sampling distn under H 1 : μ 1 - μ 2 = 2

Likelihood Approach Not as common, but very logical Resulting sample size equation is the same, but the paradigm is different. Create likelihood ratio comparing likelihood assuming different means vs. common mean:

Other outcomes Binary: –use of exact tests often necessary when study will be small –more complex equations than continuous –Why? Because mean and variance both depend on p Exact tests are often appropriate If using continuity correction with χ 2 test, then no closed form solution Time-to-event –similar to continuous –parametric vs. non-parametric –assumptions can be harder to achieve for parametric

Other issues in comparative trials Unbalanced design –why? might help accrual; might have more interest in new treatment; one treatment may be very expensive –as ratio of allocation deviates from 1, the overall sample size increases (or power decreases) Accrual rate in time-to-event studies –Length of follow-up per person affects power –Need to account for accrual rate and length of study

Equivalence and Non-inferiority trials When using frequentist approach, usually switch H 0 and H a “Superiority” trial“Non-inferiority” trial

Equivalence and Non-inferiority trials Slightly more complex To calculate power, usually define: H 0 : δ > d H a : δ < d Usually one-sided Choosing β and α now a little trickier: need to think about what the consequences of Type I and II errors will be. Calculation for sample size is the same, but usually want small δ. Sample size is usually much bigger for equivalence trials than for standard comparative trials.

Equivalence and Non-inferiority trials Confidence intervals more natural to some –Want CI for difference to exclude tolerance level –E.g. 95% CI = (-0.2,1.3) and would be willing to declare equivalent if δ = 2 –Problems with CIs: Hard-fast cutoffs (same problem as HTs with fixed α) Ends of CI don’t mean the same as the middle of CI Likelihood approach probably best (still have hard-fast rule, though).

Phase IV studies Expanded Safety studies LARGE trials Looking for adverse events These are generally quite rare Rare events → use Poisson distribution Can use frequentist of likelihood approach

Other considerations: cluster randomization Example: Prayer-based intervention in women with breast cancer –To implement, identified churches in S.E. Baltimore –Women within churches are in same ‘group’ therapy sessions Consequence: women from same churches has correlated outcomes –Group dynamic will affect outcome –Likely that, in general, women within churches are more similar (spiritually and otherwise) than those from different churches Power and sample size? –Lack of independence → need larger sample size to detect same effect –Straightforward calculations with correction for correlation –Hardest part: getting good prior estimate of correlation!

Other Considerations: Non-adherence Example: side effects of treatment are unpleasant enough to ‘encourage’ drop-out or non-adherence Effect? Need to increase sample size to detect same difference Especially common in time-to-event studies when we need to follow individuals for a long time to see event. Adjusted sample size equations available (instead of just increasing N by some percentage)

Helpful Hint: Use computer! For common situations, software is available Good software available for purchase –Stata (binomial and continuous outcomes) –NQuery –PASS –Power and Precision –Etc….. FREE STUFF ALSO WORKS!!!! –Steve P’s software –Cancer Research and Biostatistics (non-profit, related to SWOG)

Practical Considerations We don’t always have the luxury of finding N Often, N fixed by feasibility We can then ‘vary’ power or δ But sometimes, even that is difficult. We don’t always have good guesses for all of the parameters we need to know…

Not always so easy More complex designs require more complex calculations Usually also require more assumptions Examples: –Longitudinal studies –Cross-over studies –Correlation of outcomes Often, “simulations” are required to get a sample size estimate.

(1) Odds ratio between cases and controls for a one standard deviation change in marker (2) SD(a)/SD(marker) (3) Matching (4) Power for X 1 as simulated (5) Power for X 1 replaced with median from respective quintile :1 1:2 1:1 1:2 1:1 1: :1 1:2 1:1 1:2 1:1 1: :1 1:2 1:1 1:2 1:1 1: :1 1:2 1:1 1:2 1:1 1:2 > > > > (1) Odds ratio between cases and controls for a one standard deviation change in marker (in units of standard deviations of the controls) (2) SD(a)/SD(marker in controls) (3) Matching (4) Power for X 1 as simulated (5) Power for X 1 replaced with median from respective quintile :1 1:2 1:1 1:2 1:1 1: :1 1:2 1:1 1:2 1:1 1: :1 1:2 1:1 1:2 1:1 1:

Helpful Hint: Use computer! For common situations, software is available Good software available for purchase –Stata (binomial and continuous outcomes) –NQuery –PASS –Power and Precision –Etc….. FREE STUFF ALSO WORKS!!!! –Steve P’s software –Cancer Research and Biostatistics (non-profit, related to SWOG)