Presentation is loading. Please wait.

Presentation is loading. Please wait.

A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand.

Similar presentations


Presentation on theme: "A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand."— Presentation transcript:

1 A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand

2 Chapter 9 Data analysis Basic principles Basic analyses Control of confounding

3 Basic principles Effect estimation Confidence intervals P-values

4 Testing and estimation The effect estimate provides an estimate of the effect (e.g. relative risk, risk difference) of exposure on the occurrence of disease The confidence interval provides a range of values in which it is plausible that the true effect estimate may lie The p-value is the probability that differences as large or larger as those observed could have arisen by chance if the null hypothesis (of no association between exposure and disease) is correct The principal aim of an individual study should be to estimate the size of the effect (using the effect estimate and confidence interval) rather than just to decide whether or not an effect is present (using the p-value)

5 Problems of significance testing The p-value depends on two factors: the size of the effect; and the size of the study A very small difference may be statistically significant if the study is very large, whereas a very large difference may not be significant if the study is very small. The purpose of significance testing is to reach a decision based on a single study. However, decisions should be based on information from all available studies, as well as non-statistical considerations such as the plausibility and coherence of the effect in the light of current theoretical and empirical knowledge (see chapter 10).

6 Chapter 9 Data analysis Basic principles Basic analyses Control of confounding

7 Basic analyses Measures of occurrence –Incidence proportion (risk) –Incidence rate –Incidence odds Measures of effect –Risk ratio –Rate ratio –Odds ratio

8 Example: C E c a E b M0M0 d N0N0 N1N1 T C M1M1

9 Example: Smoking and Ovarian Cancer 98 158 60 E 36 40 76 E 58 24 82 C C

10

11

12 This  2 is based on the assumptions that the marginal totals of the table (N 1, N 0, M 1,M 0 ) are fixed and that the proportion of exposed cases is the same as the proportion of exposed controls (i.e. that the overall proportion M 1 /T applies to both cases and controls)

13 The natural logarithm of the odds ratio has (under a binomial model) an approximate standard error of: SE[ln(OR)] = (1/a +1/b+ 1/c +1/d) 0.5 An approximate 95% confidence interval for the odds ratio is then given by: OR e +1.96 SE

14 Chapter 9 Data analysis Basic principles Basic analyses Control of confounding

15 There are two methods of calculating a summary effect estimate to control confounding: Pooling Standardisation

16 The unadjusted (crude) findings indicate that there is a strong association between smoking and the ovarian cancer. Suppose, however, that we are concerned about the possibility that the effect of smoking is confounded by use of oral contraception (this would occur if oral contraception caused the ovarian cancer and if oral contraception was associated with smoking). We then need to stratify the data into those who have used oral contraceptives and those who have not. Example of pooling:

17 OC use YesNo Smoking Cases Controls YesNo 65 50 15 16 12 4 19 81 62 17 8 9 60 28 32 41 77 36

18 In those who have used oral contraceptives, the odds ratio for smoking is: In those who have not used oral contraceptives, the odds ratio for smoking is:

19 Thus, the crude OR for smoking (=0.46) was partly elevated due to confounding by oc use. When we remove this problem (by stratifying on oral contraceptive use) the odds ratios increase and are close to 1.0

20 In this example, the odds ratios are not exactly the same in each stratum. If they are very different (e.g. 1.0 in one stratum and 4.0 in the other stratum) then we would usually report the findings separately for each stratum. However, if the odds ratio estimates are reasonably similar then we usually wish to summarize our findings into a single summary odds ratio by taking a weighted average of the OR estimates in each stratum.

21 where OR i = OR in stratum i W i = weight given to stratum i

22 One obvious choice of weights would be to weight each stratum by the inverse of its variance (precision-based estimates). However, this method of obtaining a summary odds ratio yields estimates which are unstable and highly affected by small numbers in particular strata.

23 A better set of weights were developed by Mantel-Haenszel. These involve using the weights b i c i /T i :

24 C E 65 50 15 16 12 4 19 81 62 17 8 9 60 28 32 41 77 36 CC C E E Stratum 1Stratum 2 E

25 This set of weights yields summary odds ratio estimates which are very close to being statistically optimal (they are very close to the maximum likelihood estimates) and are very robust in that they are not unduly affected by small numbers in particular strata (provided that the strata do not have any zero marginal totals).

26 We can calculate a corresponding chi-square:

27 C E 65 50 15 16 12 4 19 81 62 17 8 9 60 28 32 41 77 36 CC C E E Stratum 1Stratum 2 E

28 The natural logarithm of the odds ratio has (under a binomial model) an approximate standard error of: ΣPRΣ(PS + QR)ΣQS SE =----- + -------------- + ------ 2R + 2 2R + S + 2S + 2 where:P = (a i + d i )/T i Q = (b i + c i )/T i R = a i d i /T i S = b i c i /T i R + = ΣR S + = ΣS

29 An approximate 95% confidence interval for the odds ratio is then given by: OR e +1.96 SE

30 E a E b cM1M1 Y1Y1 Y0Y0 PY Rate ratios:

31 E 350 0.001250.00350 E 125 10,000 Case PY Rate

32

33

34

35 The summary Mantel-Haenszel rate ratio involves taking the weights bY 1 /T to yield:

36

37 The equivalent Mantel-Haenszel chi-square is:

38 This is very similar to the  2 MH for case-control studies, but it has some minor modifications to take account of the fact that we are using person-time data rather than binomial data.

39

40 An approximate standard error for the natural log of the rate ratio is : [ ΣM 1i Y 1i Y 0i /T i2 ] 0.5 SE =------------------------------ [(Σa i Y 0i /T i )(Σb i Y 1i /T i )] 0.5

41 An approximate 95% confidence interval for the rate ratio is then given by: RR e +1.96 SE

42 Risk ratios: E a E b CasesM1M1 N1N1 N0N0 Total cd Non CasesM0M0

43

44

45 An approximate standard error for the natural log of the risk ratio is : [ ΣM 1i N 1i N 0i /T i2 - a i b i /T i ] 0.5 SE =--------------------------------- [(Σa i N 0i /T i )(Σb i N 1i /T i )] 0.5

46 An approximate 95% confidence interval for the risk ratio is then given by: RR e +1.96 SE

47 Standardization, in contrast to pooling, involves taking a weighted average of the rates in each stratum (eg age-group) before taking the ratio of the two standardized rates. Standardization has many advantages in descriptive epidemiology involving comparisons between countries, regions, ethnic groups or gender groups. However, pooling (when done appropriately) has some superior statistical properties when comparing exposed and non-exposed in specific study.

48 Summary of Stratified Analysis If we are concerned about confounding by a factor such as age, gender, smoking then we need to stratify on this factor (or all factors simultaneously if there is more than one potential confounder) and calculate the exposure effect separately in each stratum. If the effect is very different in different strata then we would report the findings separately for each stratum.

49 If the effect is similar in each stratum then we can obtain a summary estimate by taking a weighted average of the effect in each stratum. If the adjusted effect is different from the crude effect this means that the crude effect was biased due to confounding.

50 Usually we need to adjust the findings (ie stratify on) age, gender, and some other factors. If we have five age-groups and two gender- groups then we need to divide the data into ten age-gender-groups. If we have too many strata then we begin to get strata with zero marginal totals (eg with no cases or no controls). The analysis then begins to ‘break down’ and we have to consider using mathematical modelling.

51 A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand


Download ppt "A short introduction to epidemiology Chapter 9: Data analysis Neil Pearce Centre for Public Health Research Massey University Wellington, New Zealand."

Similar presentations


Ads by Google