Sample Size and Power Steven R. Cummings, MD Director, S.F. Coordinating Center.

Slides:



Advertisements
Similar presentations
Sample size estimation
Advertisements

Sample Size And Power I Jean B. Nachega, MD, PhD Department of Medicine & Centre for Infectious Diseases Stellenbosch University
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
PTP 560 Research Methods Week 9 Thomas Ruediger, PT.
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Estimation of Sample Size
1. Estimation ESTIMATION.
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
DATA ANALYSIS I MKT525. Plan of analysis What decision must be made? What are research objectives? What do you have to know to reach those objectives?
Sample size computations Petter Mostad
Research Curriculum Session III – Estimating Sample Size and Power Jim Quinn MD MS Research Director, Division of Emergency Medicine Stanford University.
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Chapter 9 Hypothesis Testing.
Sample Size Determination
Sample Size and Statistical Power Epidemiology 655 Winter 1999 Jennifer Beebe.
Sample Size and Power Calculations Andy Avins, MD, MPH Kaiser Permanente Division of Research University of California, San Francisco.
Sample Size Determination Ziad Taib March 7, 2014.
AM Recitation 2/10/11.
Hypothesis Testing:.
Inferential Statistics & Test of Significance
Introduction to Hypothesis Testing for μ Research Problem: Infant Touch Intervention Designed to increase child growth/weight Weight at age 2: Known population:
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Inference for Proportions(C18-C22 BVD) C19-22: Inference for Proportions.
1/2555 สมศักดิ์ ศิวดำรงพงศ์
Comparing Means From Two Sets of Data
More About Significance Tests
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Sample Size Determination Donna McClish. Issues in sample size determination Sample size formulas depend on –Study design –Outcome measure Dichotomous.
Topic 5 Statistical inference: point and interval estimate
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
 Is there a comparison? ◦ Are the groups really comparable?  Are the differences being reported real? ◦ Are they worth reporting? ◦ How much confidence.
Chapter 21: More About Tests “The wise man proportions his belief to the evidence.” -David Hume 1748.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Sample Size And Power Warren Browner and Stephen Hulley  The ingredients for sample size planning, and how to design them  An example, with strategies.
Questionnaires and Interviews Steven R. Cummings, MD Director, S.F. Coordinating Center.
Sample Size and Power Steven R. Cummings, MD Director, S.F. Coordinating Center.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
STA Lecture 251 STA 291 Lecture 25 Testing the hypothesis about Population Mean Inference about a Population Mean, or compare two population means.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Medical Statistics as a science
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
How confident are we in the estimation of mean/proportion we have calculated?
MeanVariance Sample Population Size n N IME 301. b = is a random value = is probability means For example: IME 301 Also: For example means Then from standard.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Chapter 21: More About Tests
© Copyright McGraw-Hill 2004
Compliance Original Study Design Randomised Surgical care Medical care.
Statistical Techniques
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
1 Probability and Statistics Confidence Intervals.
+ Unit 6: Comparing Two Populations or Groups Section 10.2 Comparing Two Means.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.3 Other Ways of Comparing Means and Comparing Proportions.
How Many Subjects Will I Need? Jane C. Johnson Office of Research Support A.T. Still University of Health Sciences Kirksville, MO USA.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Critical Appraisal Course for Emergency Medicine Trainees Module 2 Statistics.
Final project questions Review sessions tomorrow: 1:00-2:00 pm, 3:30-5:00 pm, 6:30-7:30 pm… all in SSC 107 You can start the exam as early as 8:30 am,
More on Inference.
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Associate Professor Dept. of Family & Community Medicine.
Tests of Significance The reasoning of significance tests
Hypothesis Testing Is It Significant?.
Hypothesis Testing: Hypotheses
More on Inference.
How many study subjects are required ? (Estimation of Sample size) By Dr.Shaik Shaffi Ahamed Professor Dept. of Family & Community Medicine College.
Type I and Type II Errors
Presentation transcript:

Sample Size and Power Steven R. Cummings, MD Director, S.F. Coordinating Center

The Secret of Long Life Resveratrol Resveratrol In the skin of red grapes In the skin of red grapes Makes mice Makes mice  Run faster  Live longer

What I want to show Consuming reservatrol prolongs healthy life Consuming reservatrol prolongs healthy life

Sample Size Ingredients Testable hypothesis Testable hypothesis Type of study Type of study Statistical test Statistical test  Type of variables Effect size (and its variance) Effect size (and its variance) Power and alpha Power and alpha

Sample Size Ingredients Testable hypothesis Testable hypothesis Type of study Type of study Statistical test Statistical test  Type of variables Effect size (and its variance) Effect size (and its variance) Power and alpha Power and alpha

My research question I need to plan the study I need to plan the study My question is My question is Does consuming reservatrol lead to a long and healthy life?

What’s wrong with the question? I need to plan the study I need to plan the study My question is My question is Does consuming reservatrol lead to a long and healthy life?

What’s wrong with the question? Does consuming resveratrol lead to a long and healthy life? Vague Vague Must be measurable Must be measurable

“Consuming resveratrol” Most rigorous design: randomized placebo- controlled trial Most rigorous design: randomized placebo- controlled trial Comparing red wine to placebo would be difficult Comparing red wine to placebo would be difficult But resveratrol supplements are widely available But resveratrol supplements are widely available

Measurable (specific) outcome “Consuming resevertrol” = taking resveratrol supplements vs. taking placebo “Consuming resevertrol” = taking resveratrol supplements vs. taking placebo “Prolong healthy life” = “Prolong healthy life” =

Measurable (specific) outcome “Consuming resevertrol” = taking resveratrol supplements vs. taking placebo “Consuming resevertrol” = taking resveratrol supplements vs. taking placebo “Prolong healthy life” = reduces all-cause mortality “Prolong healthy life” = reduces all-cause mortality Do people randomized to get a resveratrol supplement have a lower mortality rate than those who get a placebo?

In whom? Elderly men and women (≥70 years) Elderly men and women (≥70 years)

The research hypothesis Men and women > age 70 years randomized to get a resveratrol supplement have a lower mortality rate than those who get a placebo.

The research hypothesis The ‘alternative’ hypothesis Men and women > age 70 years randomized to get a resveratrol supplement have a lower mortality rate than those who get a placebo. Cannot be tested statistically Cannot be tested statistically Statistical tests only reject null hypothesis - that there is no effect Statistical tests only reject null hypothesis - that there is no effect

The Null Hypothesis Men and women > age 70 years randomized to receive a resveratrol supplement do not have lower mortality rate than those who receive placebo. Can be rejected by statistical tests Can be rejected by statistical tests

Ingredients for Sample Size  Testable hypothesis Type of study Type of study Statistical test Statistical test  Type of variables Effect size (and its variance) Effect size (and its variance) Power and alpha Power and alpha

Type of study Descriptive Descriptive  Only one variable / measurements What proportion of centenarians take resveratrol supplements? What proportion of centenarians take resveratrol supplements?  Confidence interval for proportions What is the mean red wine intake of centenarians? What is the mean red wine intake of centenarians?  Confidence interval for the mean

Sample size for a descriptive study For example: “What proportion of centenarians (>100 years old) take resveratrol supplements?” “What proportion of centenarians (>100 years old) take resveratrol supplements?” How much precision do you want? How much precision do you want?  Sample size is based on the width of the confidence interval (Table 6D and 6E) I assume that 20% of centenarians take resveratrol I assume that 20% of centenarians take resveratrol  Conventional 95% C.I.  I want to be confident that the truth is within ±10%  Total width of the C.I. = 0.20

Analytical studies Analytical means a comparison Analytical means a comparison  Cross-sectional  Mean red wine intake in centenarians vs year olds  Randomized trial  Elders who get resveratrol have lower mortality than those who get placebo

Ingredients for Sample Size  Testable hypothesis  Type of study: analytical (RCT) Statistical test Statistical test  Type of variables Effect size (and its variance) Effect size (and its variance) Power and alpha Power and alpha

Type of statistical tests Depends on the types of variables This works for most study planning

The types of variables? Men and women > age 70 years randomized to receive a resveratrol supplement do not have lower mortality rate than those who receive placebo Dichotomous: resveratrol or placebo Dichotomous: resveratrol or placebo Continuous: mortality rate Continuous: mortality rate What’s wrong?

The types of variables? Men and women > age 70 years randomized to receive a resveratrol supplement do not have lower mortality rate than those who receive placebo Dichotomous: reseveratrol or placebo Dichotomous: reseveratrol or placebo Continuous: mortality rate Continuous: mortality rate  It is a proportion at certain times  For example, 3% at 1 year

The appropriate test for this randomized trial for mortality

Ingredients for Sample Size  Testable hypothesis  Type of study: analytical (RCT)  Statistical test  Type of variables Effect size (and its variance) Effect size (and its variance) Power and alpha Power and alpha

Estimating the effect size For randomized trials, Start with the expected rate in the placebo Start with the expected rate in the placebo Usually available from population or cohort studies Usually available from population or cohort studies In this case, we know the mortality rates by age: In this case, we know the mortality rates by age: 3-4% per year*; for a 3 year study: 10% 3-4% per year*; for a 3 year study: 10% * ~ mean annual 78 yrs

Effect size the hardest part What should I assume for the effect of resveratrol on mortality?

Effect size the hardest part Ways to choose an effect size: What is likely, based on other data? What is likely, based on other data? Do a pilot study Do a pilot study Estimate based on effect on biomarkers Estimate based on effect on biomarkers What difference is important to detect? What difference is important to detect?  “We don’t want to miss a __%_ difference” What can we afford? What can we afford?

The effect of resveratrol on mortality rate? What is likely, based on other data? What is likely, based on other data? Do a pilot study Do a pilot study Estimate based on effect on biomarkers Estimate based on effect on biomarkers What difference is important to detect? What difference is important to detect?  “We don’t want to miss a __%_ difference” What can we afford? What can we afford?

Resveratrol pronged survival of mice fed high calorie diet Baur, Nature 2006 ~ 25%

The effect of resveratrol on mortality rate? What is likely, based on other data? What is likely, based on other data? Pilot study? What endpoint? Pilot study? What endpoint? No reliable markers for the effect on death No reliable markers for the effect on death What difference is important to detect? What difference is important to detect?  “We don’t want to miss a ____ difference” What can we afford to find? What can we afford to find?

The effect of resveratrol on mortality rate? What is likely, based on other data? What is likely, based on other data? Do a pilot study Do a pilot study Estimate based on biomarkers Estimate based on biomarkers What difference is important to detect? What difference is important to detect?  “We don’t want to miss a _1%_ difference” What can we afford? What can we afford?  1%: too big & expensive  5%: small and cheap

The effect of resveratrol on mortality rate? Finding a smaller effect is important to health Finding a smaller effect is important to health Allowing a larger effect is important for your budget Allowing a larger effect is important for your budget

Effect size Men and women > age 70 years randomized to receive a resveratrol supplement do not have lower mortality rate than those who receive placebo It would be important to find (I don’t want to miss) a 20% decrease It would be important to find (I don’t want to miss) a 20% decrease Placebo rate: 10% Placebo rate: 10% Resveratrol rate: 8% Resveratrol rate: 8%

Ingredients for Sample Size  Testable hypothesis  Type of study: analytical (RCT)  Statistical test  Type of variables  Effect size (and its variance) Power and alpha Power and alpha

 (alpha) The probability of finding a ‘significant’ result if nothing is going on

I will need to convince people Customarily, a result is ‘statistically significant’ if P<0.05 Customarily, a result is ‘statistically significant’ if P<0.05 In other words, Probability of a type I error = 5% Probability of a type I error = 5%  (alpha) = 0.05  (alpha) = 0.05

I will need to convince skeptics Very small chance that a positive result is an error Very small chance that a positive result is an error  (alpha) = 0.01 P<0.01 A smaller  means larger sample size A smaller  means larger sample size

Two-sided vs. one-sided   A 2-sided  assumes that the result could go either way A 2-sided  assumes that the result could go either way  Recognizes that you have two chances of finding something that isn’t really there  Resveratrol decreases mortality  Resveratrol increases mortality A 1-sided hypothesis reduces sample size (somewhat) A 1-sided hypothesis reduces sample size (somewhat)  A one-sided  of 0.05 corresponds to a two-sided  of 0.10 It assumes that the result could, plausibly, go only one way It assumes that the result could, plausibly, go only one way

Two-sided vs. one-sided   You may believe that your effect could only go one way! You may believe that your effect could only go one way!  Resveratrol is ‘natural.’ It could not increase mortality! Be humble. Be humble.  The history of research is filled with results that contradicted expectations  Vitamin D trial (JAMA 2010):  To everyone’s surprise, ~1500 IU of vitamin D/d increased the risk of falls and fractures in elderly women and men A 1-sided test is almost never the best choice A 1-sided test is almost never the best choice

 (beta) The probability of missing this effect size in this sample, if it is really true in the populations

Power (1-  ) The probability of finding this effect size in this sample, if it is really true in the population

If it’s true, I don’t want to miss it The chance of missing the effect (  ) The chance of missing the effect (  ) is “customarily” 20% In other words Probability of a type II error = 0.20 Probability of a type II error = 0.20  (beta) = 0.20  (beta) = 0.20 Power = 1-  0.80 Power = 1-  0.80

I really don’t want to miss it  =.10  =.10 Power (1-  ) = 0.90 Power (1-  ) = 0.90 Greater power requires a larger sample size Greater power requires a larger sample size

We have all of the ingredients  Testable hypothesis  Type of study: analytical (RCT)  Statistical test: Chi-squared  Effect size 10% vs 8%  Power: 0.90; alpha: 0.20

From Table 6B.2 Comparing two proportions

From Table 6B.2 Sample size: 4,401 Sample size: 4,401 Per group Per group Total: 8,802 Total: 8,802 Does not include drop-outs Does not include drop-outs  20% drop-out: 11,002 total sample size!

Alternatives Tweak  one-sided Tweak  one-sided  Almost never appropriate Tweak the power: 0.80 Tweak the power: 0.80

From Table 6B.2 Comparing two proportions

Alternatives Tweak  one-sided Tweak  one-sided  Almost never appropriate Tweak the power: 0.80 Tweak the power: 0.80 Modest effect: 3,308 (6,616 total) Modest effect: 3,308 (6,616 total)

Alternatives Increase the effect size Increase the effect size  10% vs. 6%

From Table 6B.2 Comparing two proportions

Increasing the effect size 10% vs. 6% 10% vs. 6% Makes a big difference! Makes a big difference! 769 / group; 1,538 total (no dropouts) 769 / group; 1,538 total (no dropouts) However, still large (and not affordable) However, still large (and not affordable) Not believable Not believable

Alternatives: a new hypothesis Change the outcome measure Change the outcome measure  Continuous measurement  A precise measurement A ‘surrogate’ for mortality rate A ‘surrogate’ for mortality rate  Strongly associated with mortality rate  Likely to be influenced by resveratrol Walking speed Walking speed

Mice on resveratrol Mice fed resveratrol Mice fed resveratrol  Live 25% longer  Are significantly faster  Have greater endurance

Increased gait speed (0.1 m/s) in 1 year and survival over 8 years Faster by ≥0.1 m/s Slower ~20% decreased mortality rate

The new ingredients  New testable hypothesis  Type of study: RCT Statistical test: t-test Statistical test: t-test  Continuous variable (walking speed)  Difference between means (pbo vs. tx) > Effect size and variance: E/S Power and alpha Power and alpha

E/S For t-scores (dichotomous predictor, continuous outcome) For t-scores (dichotomous predictor, continuous outcome) Sample size depends on the ratio of E/S Sample size depends on the ratio of E/S  E: Effect size (difference in means)  S: Standard deviation for measurement You need smaller sample size if You need smaller sample size if  Greater effect (E)  More precise measurement (lower SD)

What we need to determine E/S for our RCT Effect size (E) for change in walking speed Effect size (E) for change in walking speed  Mean baseline value = 1.0 m/s  Change in the placebo group = 0  Change with resveratrol = +0.1 m/s Standard deviation (S) Standard deviation (S)  Standard deviation of the change

S Standard deviation for the measurement Standard deviation for the measurement  Cross-sectional data: 0.25 m / sec However, we are interested in change However, we are interested in change Standard deviation of change in speed? Standard deviation of change in speed?  Often more difficult to find because cross-sectional surveys are more common than longitudinal studies

What if you don’t know the SD? Standard deviation of change in speed? Standard deviation of change in speed? If you cannot find data from other studies If you cannot find data from other studies Alternatives Alternatives  Pilot study: measure change in 3 or 6 mo.s

What if you don’t know the SD? Standard deviation of change in speed? Standard deviation of change in speed? If you cannot find data from other studies If you cannot find data from other studies Alternatives Alternatives  Pilot study: measure change in 3 or 6 mo.s  Or, a well educated guess

Estimating an S.D. the 1/4 rule ~ 4 S.D.s across a ‘usual’ range of values So, 1 S.D. will = 1/4 of the range

Estimating S.D. for change in walking speed the 1/4 rule Range of changes over 1 year* Range of changes over 1 year* +0.2 m/sec to -0.6 m/sec +0.2 m/sec to -0.6 m/sec Range = 0.8 m/sec Range = 0.8 m/sec 1/4 of the range = 0.2 m/sec 1/4 of the range = 0.2 m/sec * Single short, 6 meter walk

E/S Effect size: 0.1 m/sec difference in change Effect size: 0.1 m/sec difference in change Standard deviation: 0.2 m/sec Standard deviation: 0.2 m/sec E/S = 0.5 E/S = 0.5

The new ingredients  New testable hypothesis  Type of study: analytical (RCT)  Statistical test: t-test  Continuous variable  Difference between means  Effect size 1.0 vs. 1.1 m/sec; E/S = 0.5  Power: 0.80; alpha: 0.20

The new ingredients  New testable hypothesis  Type of study: analytical (RCT)  T-test  Effect size 1.0 vs.1.1 m/sec; E/S: 0.5  Power: 0.80; alpha: 0.20 Sample size: 64 per group; 128 total With 20% drop out: 160 total

Improving precision of the outcome measurement Increased precision (decreased SD) will decrease the sample size Increased precision (decreased SD) will decrease the sample size For example For example  Mean of repeated walks  Longer, 400 m walk  Standard deviation may improve from 0.2 m/sec to 0.15 m/sec E/S improves from 0.5 to 0.1÷ 0.15 = 0.7 E/S improves from 0.5 to 0.1÷ 0.15 = 0.7

A modest improvement in precision reduced sample size by 1/2

Summary Estimate sample size early Estimate sample size early Systematically collect the ingredients Systematically collect the ingredients Effect size is the most difficult - and important - judgement Effect size is the most difficult - and important - judgement Alternatives that reduce sample size Alternatives that reduce sample size  Compromise power  Increase effect size  Precise continuous outcomes