# Sample Size Calculation

## Presentation on theme: "Sample Size Calculation"— Presentation transcript:

Sample Size Calculation
PD Dr. Rolf Lefering IFOM - Institut für Forschung in der Operativen Medizin Universität Witten/Herdecke Campus Köln-Merheim

Sample Size Calculation
uncertainty Sample Size Calculation costs & effort & time

Sample Size Calculation
Single study group - continuous measurement - count of events Comparative trial (2 or more groups)

Which true value is compatible with the observation?
Confidence Interval Which true value is compatible with the observation? Confidence interval ... range where the true value lies with a high probability (usually 95%)

all patients with open fractures
Confidence Interval Example: 56 patients with open fractures, 9 developed an infection (16%) sample all patients with open fractures n=56 infection rate: 16% true value ???

Formula for event rates
Confidence Interval Formula for event rates p * (100 - p) CI95 = P +/- 1,96 *  n n = sample size p = percentage Example: n = 56 p = 16% CI95 = 16 +/- 1,96 *  (16*84) / 56 = /- 9,6 [ 6,4 - 25,6 ]

Confidence Interval 95% confidence interval around a 20% incidence rate

Formula for continuous variables
Confidence Interval Formula for continuous variables Mean: M = mean SE = standard error SD = standard deviation n = sample size CI95 = M  1,96 * SE 1,65 für 90% 1,96 für 95% 2,58 für 99% Remember: SE = SD / n

Sample Size Calculation
Comparative trials

„What is the sample size to show that early weight-bearing therapy is better ?“
„Which key should I press here now ?“ „What is the sample size to show that early weight bearing therapy, as compared to standard therapy, is able to reduce the time until return to work from 10 weeks to 8 weeks, where time to work has a SD of 3 ?“ 36 cases per group !

Inedpemdence, autonomy
Outcome Measures Survival Organ failure Hospital stay Recurrence rate Complications Sepsis Lab values Blood pressure Wound infection Beweglichkeit Wellbeing Inedpemdence, autonomy Pain Fear Depressionen Social status Anxiety Fatigue

Select Outcome Measure
Relevance Does this endpoint convince the patient / the scientific community? Reliability; measurability Could the outcome easily be measured, without much variation, also by different people? Sensitivity Does the intervention lead to a significant change in the outcome measure? Robustness How much is the endpoint influenced by other factors?

Select Outcome Measure
Primary endpoint Main hypothesis or core question; aim of the study Statistics: confirmative Secondary endpoints Other interesting questions, additional endpoints Statistics: explorative (could be confirmative in case of a large difference) Advantage: prospective selection in the study protocol Retrospektively selected endpoints Selected when the trial is done, based on subgroup differences Statistics: ONLY explorative !

Sample Size Calculation
Difference to be detected Certainty  - error Power

Statistical Testing A statistical test
is a method (or tool) to decide whether an observed difference* is really present or just based on variation by chance * this is true for a test for difference which is the most frequently applied one in medicine

Statistical Testing Test for difference „Intervention A is better than B“ Test for equivalence „Intervention A and B have the same effect“ Test for non- inferiority „Intervention A is not worse than B“

How a test procedure works
Statistical Testing How a test procedure works 1. Want to show: there is a difference 2. Assume: there is NO difference between the groups; („equal effects“, null-hypothesis) 3. Try to disprove this assumption: - perform study / experiment - measure the difference 4. Calculate: the probability that such a difference could occur although the assumption („no difference“) was true = p-value

statistical test for difference:
Statistical Testing statistical test for difference: The p-value is the probability for the case that the observed difference occured just by chance

p is the probability for „no difference“
Statistical Testing statistical test for difference : p is the probability for „no difference“

6 : 0 für Germany Statistical Testing
„Germany and Spain are equally strong soccer teams !“ Game tonight: 6 : 0 für Germany trial n=6 Null hypothesis p-value says: How big is the chance that one of two equally strong teams scores 6 goals, and the other one none. statistical test: p = 0,031 Spain could still be equally strong as Germany, but the chance is small (3,1%)

Statistical Testing p=0,68 p=0,05 p=0,05 p<0,001 small sample
large sample small difference p=0,68 p=0,05 large difference p=0,05 p<0,001

Statistical Testing The more cases are included, the better could „equality“ be disproved Example: drug A has a success rate of 80%, while drug B is better with a healing rate of 90% drug A drug B sample size 80% 90% p-value 20 8/10 9/10 0,53 40 16/20 18/20 0,38 100 40/50 45/50 0,16 200 80/100 90/100 0,048 / /200 0,005 / /500 <0,001

A „significant“ p-value ... does NOT prove the size of the difference,
Statistical Testing A „significant“ p-value ... does NOT prove the size of the difference, but only excludes equality!

in maintained “no difference”
Statistical Testing p-value p-value large (>0.05) The observed difference is probably caused by chance only, or the sample size in not sufficient to exclude chance null-hypothesis in maintained “no difference” p-value small (0.05) chance alone is not sufficient to explain this difference  there is a systematic difference null-hypothesis is rejected “significant difference“

Statistical Testing Errors The decision
- for a difference (significance, p  0.05) - or against it („equality“, not significant, p > 0.05) is not certain but only a probability (p-value). Therefore, errors are possible: Type 1 error: Decision for a difference although there is none => wrong finding Type 2 error: Decision for „equality“ although there is one => missed finding

C C Statistical Testing Truth significant a not significant b Errors
Test says ... no difference difference significant type 1 error C wrong finding a not significant C type 2 error missed finding b

Statistical Testing type 1 error type 2 error   “wrong finding“ „missed finding“ Fire detector wrong alarm no alarm in case of fire Court conviction of set a an innocent criminal free Clinical study difference difference was “significant” was missed by chance

“What is the Power of the study ?”
Type 2 error  probability to miss a difference Power = 1 -  probability to detect a difference Power depends on: - the magnitude of a difference - the sample size - the variation of the outcome measure - the significance level ()

Power “What is the Power of the study ?”
“Does the study have enough power to detect a difference of size X ?” “What is the Power of the study ?” POWER is the probability to detect a certain difference X with the given sample size n as significant (at level ).

Power When to perform power calculations?
1. Planning phase – sample size calculation: if the assumed difference really exists, what risk would I take to miss this difference ? 2. Final analysis – in case of a non-significant result: what size of difference could be rejected with the present data ?

What is the power of the study ???
Example Clinical trial: Laparoscopic versus open appendectomy Endpoint: Maximum post-operative pain intensity (VAS points) Patients: 30 cases per group Results: lap.: (SD 18) open: 32 (SD 17) p = not significant ! What is the power of the study ???

Sample Size Calculation
Difference to be detected Certainty  - error Power

Difference to be detected
Sample Size Calculation Sample size Difference to be detected  = 0.05  = 0.20  error Risk to find a difference by chance  error Risk to miss a real difference

Sample Size Calculation
PT & PC or Difference & SD  = 0.05  = 0.20 Event rates: Percentages in the treatment and the control group Continuous measures: difference of means and standard deviation

Sample Size Calculation
Continuous Endpoints SD unknown if the variation (standard deviation) is not known, the expected advantage could be expressed as „effect size“ which is the difference in units of the (unknown) SD Example: pain values are at least 1 SD below the control group (effect size = 1.0) the difference will be at least half a SD (effect size = 0.5)

Sample Size Calculation
Continuous Endpoints Test with non-parametric rank statistics non-normal distribution, or non-metric values Mann-Whitney U-test; Wilcoxon test Use t-Test for sample size calculation and add 10% of cases

Guess … Sample Size Calculation
How many patients are needed to show that a new intervention is able to reduce the complication rate from 20% to 14% ? (=0.05; =0.20, i.e. 80% power)

Power and Sample Size Calculations: A Review and Computer Program
Dupont WD, Plummer WD Power and Sample Size Calculations: A Review and Computer Program Contr. Clin. Trials (1990) 11:

Sample Size Calculation

Multiple testing increases the risk of arbitrary significant results
Mehr als eine Versuchs-/Therapiegruppe Mehrere Zielgrößen Mehrere Follow-Up Zeitpunkte Zwischenauswertungen Subgruppen-Analysen Multiple testing increases the risk of arbitrary significant results Overall statistical error in 8 tests at the 0.05 level: α = = 1 - 0,66 = 0.34

Multiple Testing 1 test (with 5% error) 95% 5%
correct at least 1 error 1 test (with 5% error) 95% 5% 2 tests (with 5% error each) 90,25% 9,75% 3 tests 4 tests 5 tests ….. 90,25% 4,75% 4,75% 0,25%

Multiple Testing 1 test (with 5% error) 95% 5%
correct at least 1 error 1 test (with 5% error) 95% 5% 2 tests (with 5% error each) 90,2% 9,8% 3 tests 85,7% 14,3% 4 tests 81,5% 18,5% 5 tests 77,4% 22,6% …..

Multiple Testing What could you do?
Select ONE primary and multiple secondary questions Combination of endpoints multiple complications  „Negative event“ multiple time points  AUC, maximum value, time to normal multiple endpoints  sum score acc. to O‘Brian Adjustment of p-values, i.e. each endpoint is tested with a „stronger“ α level e.g. Bonferroni: k tests at level α / k (5 tests at the 1% level, instead of 1 Test at 5% level) A priori ordered hypotheses predefine the order of tests (each at 5% level)

Interim Analysis Fixed sample size end of trial
Sequential design after each case Group sequential design after each step Adaptive design after each step

Interim Analysis aus: TR Flemming, DP Harrington, PC O‘Brian Design of group sequential tests. Contr. Clin Trials (1984) 5:

Similar presentations