Presentation on theme: "Critical review of significance testing F.DAncona from a Alain Morens lecture 2006."— Presentation transcript:
Critical review of significance testing F.DAncona from a Alain Morens lecture 2006
Botulism outbreak in Italy The relative risk of illness was higher among diners who ate home preserved green olives (RR=2.9) Is it statistically significant ?
Tests of statistical significance Many of them regarding differences between means or proportions These tests help to establish if the observed difference is real (= if it is not due to the chance alone)
The two hypothesis! There is a difference between people that ate olives and people that didnt eat them Hypothesis (H 1 ) (alternative hypothesis) When you perform a test of statistical significance you usually reject or not reject the Null Hypothesis (H 0 ) There is NO difference between the two groups Null Hypothesis (H 0 ) (example: RR = 1 OR=1)
Hypothesis, testing and null hypothesis If data provide evidence against the Null Hypothesis then this hypothesis can be rejected in favour of some alternative hypothesis H 1 (the objective of our study). If you dont reject the Null Hypothesis never you can say that the Null Hypothesis is true. You can only reject it or not reject it.
p = probability that a result (for example a difference between proportions or a RR) or more extreme values can be observed by chance alone Significance testing: H 0 rejected using reported p value Small p values = low degree of compatibility between H 0 and the observed data: you reject H 0 and the test is significant. Large p values = high degree of compatibility between H 0 and the observed data: you dont reject H 0, the test is not significant Never we can reduce to zero the probability that our result was not observed by chance alone
Levels of significance We need of a cut-off ! 0.01 0.05 0.10 p value > 0.05 = H 0 non rejected (non significant) p value 0.05 = H 0 rejected (significant) Avoid to submit for publication if p > 0.05 Referees commonly relied on tests of significance
p = 0.05 and its errors Level of significance, usually p = 0.05 p value was used for decision making but still 2 possible errors H 0 should not be rejected, but it was rejected (Type I or alpha error or false positive) H 0 should be rejected but it was not rejected (Type II or beta error or false negative)
H 0 is true but rejected: Type I or error H 0 is false but not rejected: Type II or error Types of errors Test result Truth The p value level is the level of error that we could accept (usually 5%)
TreatmentSuccessful UnsuccessfulTotal B 14 822 A 7 1320 Treatment B, success = 64 % Treatment A, success = 35% 2 = 3.44 p = NS Hypothetical data from a clinical trial of a new treatment p > 0.05 p = 0.06 Different ways to write the same concept but with more information
The epidemiologist needs measurements rather than probabilities 2 is a test of association. OR, RR are measure of association on a continuous scale (infinite number of possible values) The best estimate = point estimate Range of values allowing for random variability = confidence interval (precision of the point estimate)
the amount of variability in the data the dimension of the sample the arbitrary level of confidence (usually 90%, 95%, 99%) One way to use confidence interval is : If 1 is included in CI, then NON SIGNIFICANT If 1 is not included in CI, then SIGNIFICANT Width of confidence interval depends on …
Confidence interval provide more information than p value magnitude of the effect (strength of association) direction of the effect (RR > or < 1) precision around the point estimate of the effect (variability) p value can not provide them !
Level of confidence interval at 95% If the data collection and analysis could be replicated many times, the CI should include within it the TRUE value of the measure 95% of the time The only thing that should bring variability is the chance!
TreatmentSuccessful UnsuccessfulTotal B 14 822 A 7 1320 Treatment B, success = 64 % Treatment A, success = 35% p = NS RR = 1.82 95% CI ( 0.93 - 3.57) Hypothetical data from a clinical trial of a new treatment p > 0.05 p = 0.06 Different ways to write the same concept but with more information
More studies are better or worse? Decision based on results from a collection of studies are not facilitated when each study is classified as a YES or NO decision. You have to look the CI and the punctual estimation But also consider its clinical or biological significance 1 RR 20 studies with different results...
Study A, large sample, precise results, narrow CI - SIGNIFICANT Study B, small size, large CI - NON SIGNIFICANT Looking the CI Study A, effect close to NO EFFECT Study B, no information about absence of large effect RR = 1 A B Large RR
2 = A test of association. It depends on sample size p value = Probability that equal (or more extreme) results can be observed by chance alone OR, RR = Direction & strength of association if > 1risk factor if < 1protective factor (independently from sample size) CI = Magnitude and precision of effect What we have to evaluate the study Remember that these values not provide any information on the possibility that the observed association is due to a bias or confounding. This possibility should be investigated
Cases Non casesTotal 2 = 1.3 E 9 51 60p = 0.13 NE 5 55 60RR = 1.8 Total1410612095% CI [ 0.6 - 4.9 ] Cases Non casesTotal 2 = 12 E 90 510 600p = 0.0002 NE 50 550 600RR = 1.8 Total1401060120095% CI [ 1.3-2.5 ] Cases Non casesTotal 2 = 12 E 600 1400 2000p = 0.0002 NE 500 1500 2000RR = 1.2 Total1100 2900 400095% CI [ 1.1-1.3 ] 2 and Relative Risk
Exposurecases non casesAR% Yes152042.8% No5020020.0% Total65220 Common source outbreak suspected Remember that these values do not provide any information on the possibility that the observed association is due to a bias or confounding. HOW YOU COULD EXPLAIN THAT ONLY 23% OF CASES WERE EXPOSED ? 2 = 9.1 p = 0.002 RR= 2.1 95%CI= 1.4-3.4
Recommendations Hypothesis testing and CI evaluate only the role of chance as alternative explanation of the association. Interpret with caution every association that achieves statistical significance. Double caution if this statistical significance is not expected.
P < 0.05 Rothman It is not a good description of the information in the data