3 Sample Size Calculation Single study group- continuous measurement- count of eventsComparative trial (2 or more groups)
4 Which true value is compatible with the observation? Confidence IntervalWhich true value is compatible with the observation?Confidence interval... range where the true value lies with a high probability (usually 95%)
5 all patients with open fractures Confidence IntervalExample:56 patients with open fractures, 9 developed an infection (16%)sampleall patients with open fracturesn=56infection rate:16%true value ???
6 Formula for event rates Confidence IntervalFormula for event ratesp * (100 - p) CI95 = P +/- 1,96 * nn = sample sizep = percentageExample: n = 56 p = 16%CI95 = 16 +/- 1,96 * (16*84) / 56 = /- 9,6 [ 6,4 - 25,6 ]
7 Confidence Interval95% confidence interval around a 20% incidence rate
8 Formula for continuous variables Confidence IntervalFormula for continuous variablesMean:M = mean SE = standard error SD = standard deviation n = sample sizeCI95 = M 1,96 * SE1,65 für 90% 1,96 für 95% 2,58 für 99%Remember:SE = SD / n
10 „What is the sample size to show that early weight-bearing therapy is better ?“ „Which key should I press here now ?“„What is the sample size to show that early weight bearing therapy, as compared to standard therapy, is able to reduce the time until return to work from 10 weeks to 8 weeks, where time to work has a SD of 3 ?“36 cases per group !
12 Select Outcome Measure Relevance Does this endpoint convince the patient / the scientific community?Reliability; measurability Could the outcome easily be measured, without much variation, also by different people?Sensitivity Does the intervention lead to a significant change in the outcome measure?Robustness How much is the endpoint influenced by other factors?
13 Select Outcome Measure Primary endpoint Main hypothesis or core question; aim of the study Statistics: confirmativeSecondary endpoints Other interesting questions, additional endpoints Statistics: explorative (could be confirmative in case of a large difference) Advantage: prospective selection in the study protocolRetrospektively selected endpoints Selected when the trial is done, based on subgroup differences Statistics: ONLY explorative !
14 Sample Size Calculation Difference to be detectedCertainty - error Power
15 Statistical Testing A statistical test is a method (or tool) to decide whether an observed difference* is really present or just based on variation by chance* this is true for a test for difference which is the most frequently applied one in medicine
16 Statistical TestingTest for difference „Intervention A is better than B“Test for equivalence „Intervention A and B have the same effect“Test for non- inferiority „Intervention A is not worse than B“
17 How a test procedure works Statistical TestingHow a test procedure works1. Want to show: there is a difference2. Assume: there is NO difference between the groups; („equal effects“, null-hypothesis)3. Try to disprove this assumption: - perform study / experiment - measure the difference4. Calculate: the probability that such a difference could occur although the assumption („no difference“) was true= p-value
18 statistical test for difference: Statistical Testingstatistical test for difference:The p-value is the probability for the case that the observed difference occured just by chance
19 p is the probability for „no difference“ Statistical Testingstatistical test for difference :p is the probability for „no difference“
20 6 : 0 für Germany Statistical Testing „Germany and Spain are equally strong soccer teams !“Game tonight:6 : 0 für Germanytrialn=6Null hypothesisp-value says:How big is the chance that one of two equally strong teams scores 6 goals, and the other one none.statistical test:p = 0,031Spain could still be equally strong as Germany, but the chance is small (3,1%)
21 Statistical Testing p=0,68 p=0,05 p=0,05 p<0,001 small sample large samplesmall differencep=0,68p=0,05large differencep=0,05p<0,001
22 Statistical TestingThe more cases are included, the better could „equality“ be disprovedExample: drug A has a success rate of 80%, while drug B is better with a healing rate of 90%drug A drug B sample size 80% 90% p-value20 8/10 9/10 0,5340 16/20 18/20 0,38100 40/50 45/50 0,16200 80/100 90/100 0,048/ /200 0,005/ /500 <0,001
23 A „significant“ p-value ... does NOT prove the size of the difference, Statistical TestingA „significant“ p-value ...does NOT prove the size of the difference,but only excludes equality!
24 in maintained “no difference” Statistical Testingp-valuep-value large (>0.05)The observed difference is probably caused by chance only, or the sample size in not sufficient to exclude chance null-hypothesisin maintained “no difference”p-value small (0.05)chance alone is not sufficient to explain this difference there is a systematic difference null-hypothesis is rejected “significant difference“
25 Statistical Testing Errors The decision - for a difference (significance, p 0.05)- or against it („equality“, not significant, p > 0.05)is not certain but only a probability (p-value). Therefore, errors are possible:Type 1 error: Decision for a difference although there is none => wrong findingType 2 error: Decision for „equality“ although there is one => missed finding
26 C C Statistical Testing Truth significant a not significant b Errors Test says ...no differencedifferencesignificanttype 1 errorCwrong findinganot significantCtype 2 errormissed findingb
27 Statistical Testingtype 1 error type 2 error “wrong finding“ „missed finding“Fire detector wrong alarm no alarm in case of fireCourt conviction of set a an innocent criminal freeClinical study difference difference was “significant” was missed by chance
28 “What is the Power of the study ?” Type 2 error probability to miss a differencePower = 1 - probability to detect a differencePower depends on:- the magnitude of a difference - the sample size - the variation of the outcome measure - the significance level ()
29 Power “What is the Power of the study ?” “Does the study have enough power to detect a difference of size X ?”“What is the Power of the study ?”POWER is the probability to detect a certain difference X with the given sample size n as significant (at level ).
30 Power When to perform power calculations? 1. Planning phase – sample size calculation: if the assumed difference really exists, what risk would I take to miss this difference ?2. Final analysis – in case of a non-significant result: what size of difference could be rejected with the present data ?
31 What is the power of the study ??? ExampleClinical trial: Laparoscopic versus open appendectomyEndpoint: Maximum post-operative pain intensity (VAS points)Patients: 30 cases per groupResults: lap.: (SD 18) open: 32 (SD 17)p = not significant !What is the power of the study ???
32 Sample Size Calculation Difference to be detectedCertainty - error Power
33 Difference to be detected Sample Size CalculationSample sizeDifference to be detected = 0.05 = 0.20 error Risk to find a difference by chance error Risk to miss a real difference
34 Sample Size Calculation PT & PCorDifference & SD = 0.05 = 0.20Event rates: Percentages in the treatment and the control groupContinuous measures: difference of means and standard deviation
35 Sample Size Calculation Continuous EndpointsSD unknownif the variation (standard deviation) is not known, the expected advantage could be expressed as„effect size“which is the difference in units of the (unknown) SDExample:pain values are at least 1 SD below the control group (effect size = 1.0)the difference will be at least half a SD (effect size = 0.5)
36 Sample Size Calculation Continuous EndpointsTest with non-parametric rank statisticsnon-normal distribution, or non-metric valuesMann-Whitney U-test; Wilcoxon testUse t-Test for sample size calculationand add 10% of cases
37 Guess … Sample Size Calculation How many patients are needed to show that a new intervention is able to reduce the complication rate from 20% to 14% ?(=0.05; =0.20, i.e. 80% power)
38 Power and Sample Size Calculations: A Review and Computer Program Dupont WD, Plummer WDPower and Sample Size Calculations: A Review and Computer ProgramContr. Clin. Trials (1990) 11:
40 Multiple testing increases the risk of arbitrary significant results Mehr als eine Versuchs-/TherapiegruppeMehrere ZielgrößenMehrere Follow-Up ZeitpunkteZwischenauswertungenSubgruppen-AnalysenMultiple testing increases the risk of arbitrary significant resultsOverall statistical error in 8 tests at the 0.05 level:α = = 1 - 0,66 = 0.34
41 Multiple Testing 1 test (with 5% error) 95% 5% correct at least 1 error1 test (with 5% error) 95% 5%2 tests (with 5% error each) 90,25% 9,75%3 tests4 tests5 tests…..90,25%4,75%4,75%0,25%
42 Multiple Testing 1 test (with 5% error) 95% 5% correct at least 1 error1 test (with 5% error) 95% 5%2 tests (with 5% error each) 90,2% 9,8%3 tests 85,7% 14,3%4 tests 81,5% 18,5%5 tests 77,4% 22,6%…..
43 Multiple Testing What could you do? Select ONE primary and multiple secondary questionsCombination of endpoints multiple complications „Negative event“ multiple time points AUC, maximum value, time to normal multiple endpoints sum score acc. to O‘BrianAdjustment of p-values, i.e. each endpoint is tested with a „stronger“ α level e.g. Bonferroni: k tests at level α / k (5 tests at the 1% level, instead of 1 Test at 5% level)A priori ordered hypotheses predefine the order of tests (each at 5% level)
44 Interim Analysis Fixed sample size end of trial Sequential design after each caseGroup sequential design after each stepAdaptive design after each step
45 Interim Analysisaus: TR Flemming, DP Harrington, PC O‘Brian Design of group sequential tests. Contr. Clin Trials (1984) 5: