# Significance testing Ioannis Karagiannis (based on previous EPIET material) 18 th EPIET/EUPHEM Introductory course 28.09.2012.

## Presentation on theme: "Significance testing Ioannis Karagiannis (based on previous EPIET material) 18 th EPIET/EUPHEM Introductory course 28.09.2012."— Presentation transcript:

Significance testing Ioannis Karagiannis (based on previous EPIET material) 18 th EPIET/EUPHEM Introductory course 28.09.2012

The idea of statistical inference Sample Population Conclusions based on the sample Generalisation to the population Hypotheses 2

Inferential statistics Uses patterns in the sample data to draw inferences about the population represented, accounting for randomness Two basic approaches: – Hypothesis testing – Estimation Common goal: conclude on the effect of an independent variable on a dependent variable 3

The aim of a statistical test To reach a deterministic decision (yes or no) about observed data on a probabilistic basis. 4

Why significance testing? Norovirus outbreak on a Greek island: The risk of illness was higher among people who ate raw seafood (RR=21.5). Is the association due to chance? 5

The two hypotheses There is a difference between the two groups (=there is an effect) Alternative Hypothesis (H 1 ) (e.g.: RR=21.5) When you perform a test of statistical significance, you reject or do not reject the Null Hypothesis (H 0 ) There is NO difference between the two groups (=no effect) Null Hypothesis (H 0 ) (RR=1) 6

Norovirus on a Greek island Null hypothesis (H 0 ): There is no association between consumption of raw seafood and illness. Alternative hypothesis (H 1 ): There is an association between consumption of raw seafood and illness. 7

Hypothesis testing Tests of statistical significance Data not consistent with H 0 : – H 0 can be rejected in favour of some alternative hypothesis H 1 (the objective of our study). Data are consistent with the H 0 : – H 0 cannot be rejected You cannot say that the H 0 is true. You can only decide to reject it or not reject it. 8

p value p value = probability that our result (e.g. a difference between proportions or a RR) or more extreme values could be observed under the null hypothesis H 0 rejected using reported p value 9

p values – practicalities Low p values = low degree of compatibility between H 0 and the observed data: association unlikely to be by chance you reject H 0, the test is significant High p values = high degree of compatibility between H 0 and the observed data: association likely to be by chance you dont reject H 0, the test is not significant 10

Levels of significance – practicalities We need of a cut-off ! 1% 5% 10% p value > 0.05 = H 0 not rejected (non significant) p value 0.05 = H 0 rejected (significant) BUT: Give always the exact p-value rather than significant vs. non-significant. 11

The limit for statistical significance was set at p=0.05. There was a strong relationship (p<0.001). …, but it did not reach statistical significance (ns). The relationship was statistically significant (p=0.0361) Examples from the literature p=0.05 Agreed convention Not an absolute truth Surely, God loves the 0.06 nearly as much as the 0.05 (Rosnow and Rosenthal, 1991) 12

p = 0.05 and its errors Level of significance, usually p = 0.05 p value used for decision making But still 2 possible errors: H 0 should not be rejected, but it was rejected : Type I or alpha error H 0 should be rejected, but it was not rejected : Type II or beta error 13

H 0 is true but rejected: Type I or error H 0 is false but not rejected: Type II or error Types of errors Decision based on the p value Truth No diff Diff 14

More on errors Probability of Type I error: – Value of α is determined in advance of the test – The significance level is the level of α error that we would accept (usually 0.05) Probability of Type II error: – Value of β depends on the size of effect (e.g. RR, OR) and sample size – 1- β : Statistical power of a study to detect an effect on a specified size (e.g. 0.80) – Fix β in advance: choose an appropriate sample size 15

Quantifying the association Test of association of exposure and outcome E.g. chi 2 test or Fishers exact test Comparison of proportions Chi 2 value quantifies the association The larger the chi 2 value, the smaller the p value – the more the observed data deviate from the assumption of independence (no effect). 16

Chi-square value 17

Norovirus on a Greek island 2x2 table 299 5136 Raw seafood No raw seafood IllNon ill 34145 38 141 179 18 19 %81% Expected proportion of ill and not ill : x19% ill x 81% non-ill x 19% ill x 81% non-ill Expected number of ill and not ill for each cell : 6 27114 31

Chi-square calculation (29-6) 2 /6(9-31) 2 /31 (5-27) 2 /27 (136-114) 2 / 114 Raw seafood No raw seafood IllNon ill 34145 38 141 179 19 χ 2 = 125 p < 0.001

Norovirus on a Greek island The attack rate of illness among consumers of raw seafood was 21.5 times higher than among non consumers of these food items (p<0.001). The p value is smaller than the chosen significance level of α = 5%. The null hypothesis is rejected. There is a < 0.001 probability (<1/1000) that the observed association could have occured by chance, if there were no true association between eating imported raw seafood and illness. 20

C2012 vs facilitators The ultimate (eye) test. H 0 : the proportion of facilitators wearing glasses during the Tuesday morning sessions was equal to the proportion of fellows wearing glasses. H 1 : the above proportions were different. 21

C2012 vs facilitators 1127 68 Fellow Facilitator GlassesNo glasses 1735 38 14 52 22 33%67% Expected proportion of ill and not ill : x33% +ve x67% -ve x33% +ve x67% -ve Expected number of ill and not ill for each cell : 13 4.69.4 25

Chi-square calculation (11-13) 2 /13(27-25) 2 /25 (6-4.6) 2 /4.6(8-9.4) 2 /9.4 Fellow Facilitator GlassesNo glasses 23 χ 2 = 1.11 p = 0.343

t-test Used to compare means of a continuous variable in two different groups Assumes normal distribution 24

t-test H 0 : fellows with glasses do not tend to sit further in the back of the room compared to fellows without glasses H 1 : fellows with glasses tend to sit further in the back of the room compared to fellows without glasses 25

t-test 26

Epidemiology and statistics 27

Criticism on significance testing Epidemiological application need more than a decision as to whether chance alone could have produced association. (Rothman et al. 2008) Estimation of an effect measure (e.g. RR, OR) rather than significance testing. 28

Suggested reading KJ Rothman, S Greenland, TL Lash, Modern Epidemiology, Lippincott Williams & Wilkins, Philadelphia, PA, 2008 SN Goodman, R Royall, Evidence and Scientific Research, AJPH 78, 1568, 1988 SN Goodman, Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy, Ann Intern Med. 130, 995, 1999 C Poole, Low P-Values or Narrow Confidence Intervals: Which are more Durable? Epidemiology 12, 291, 2001 29

Previous lecturers Alain Moren Paolo DAncona Lisa King Ágnes Hajdu Preben Aavitsland Doris Radun Manuel Dehnert 30

Similar presentations