Download presentation

Presentation is loading. Please wait.

Published bySky Skilton Modified about 1 year ago

1
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.1 Contingency Tables & Logistic Regression

2
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.2

3
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.3 Two Weeks Ago… Counts and Proportions Binary = Dichotomous Mutually exclusive endpoints Disease vs. No disease Success vs. Failure Hit vs. No Hit Heads vs. Tails Covered one and two sample tests of proportions for ONE variable. Both Exact and Normal approx methods

4
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.4 Tonight… Relationships between Proportions 1.Is there an association between categorical variables? Chi-square & Fisher’s Exact Tests 2.What is the magnitude and direction of this association? Odds Ratios 3.Are there any intervening variables? Confounders & Effect Modifiers

5
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.5 Contingency Tables We are often interested in determining whether there is an association between two categorical variables. Note that association does not necessarily imply causality. In these cases, data may be represented in a two-dimensional table.

6
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.6 Smoking Smoker Non- smoker Lung Cancer Yesac Nobd Contingency Tables

7
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.7 Contingency Tables The categorical variables can have more than two levels. The variables may also be ordinal, however this requires more advanced methods. For now, we consider the case in which both variables are nominal.

8
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.8 Consider the following data: If we want to test whether the proportion of unprotected cyclists that have serious head injuries is higher than that of protected cyclists, we can carry out a test of hypothesis involving the two proportions p 1 =17/147=0.115, and p 2 =218/646= Contingency Table Example: Bike Helmets

9
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.9 Contingency Table Example: Bike Helmets P< and thus we reject the null hypothesis at the 95% significance level.

10
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.10 Chi-Square Test Alternative technique to test of two independent proportions… Hypothesis Test: H 0 : No association H A : Association Strategy: Compare what is observed to what is expected if H 0 is true (i.e., no association). If difference is large, then there is evidence of association If difference is not large, then insufficient evidence to conclude an association

11
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.11 Chi-Square Test Some limitations: Does not describe the magnitude or the direction of the association Relies on “large sample theory” (an assumption), which means that the test may be invalid if expected cell sizes are too small (<5). Thus avoid use under these conditions.

12
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.12 Contingency Table Example: Bike Helmets Suppose that you wanted to determine whether there is any association between wearing helmets and frequency of brain injuries. Then we could perform the chi- square test (based on the χ 2 distribution). This test is set-up as follows: 1.Ho: Suffering a head injury is not associated with wearing a helmet 2.Ha: There is an association between wearing a helmet and suffering a head injury

13
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.13 Now consider the implication of the null hypothesis… If the distinction between the two groups (helmet wearers and non-helmet wearers) is an artificial one, then the head-injury rate is better estimated by: Contingency Table Example: Bike Helmets

14
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.14 The expected number of injured protected cyclists is (0.2963)(147)=43.6 injuries on average (versus the observed 17). Similarly, the number of injured unprotected cyclists should be (0.2963)(646)=191.4 (versus the observed 218). The expected number of uninjured helmeted cyclists is ( )(147)=103.4 (versus the observed 130), and the expected number of unprotected uninjured cyclists is ( )(646)=454.6 (versus the observed 428). Contingency Table Example: Bike Helmets

15
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.15 The chi-square test is based on quantifying whether deviations from these two expected numbers are serious enough to warrant rejection of the null hypothesis. In general, the chi square test looks like this: E i is the expected number, O i is the observed number, r is the number of rows, and c is the number of columns. Then, is distributed according to the chi-square distribution with df=(r-1)(c-1) degrees of freedom. Critical percentiles of the chi-square distribution can be found in the appendix of your textbook (Table A.8). Chi-Square Test

16
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.16 Chi-Square Test OBSERVED Exposed Not Exposed Event O 11 O 12 No Event O 21 O 22 EXPECTED Exposed Not Exposed Event E 11 E 12 No Event E 21 E 22 vs. Remember, all expected cell counts all must be ≥5

17
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.17 Returning to our example… The chi square test (with continuity correction) is: Contingency Table Example: Bike Helmets

18
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.18 We compare this value to 3.84, the right tail of the chi-square distribution with (2-1)(2-1)=1 degree of freedom. 27.27>3.84, so the null hypothesis is rejected. (Note: this uses a continuity correction factor.) Contingency Table Example: Bike Helmets

19
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.19 Chi-square distribution with 1 degree of freedom. Note the 5% right tail (to the right of 3.84). We rejected the null hypothesis because as extreme values as or higher would have much less than 5% probability of being observed, if the null hypothesis were correct. Contingency Table Example: Bike Helmets When df=1, χ 2 = Z 2

20
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.20 STATA output: The p-value of the test is < 0.05, so we reject the null hypothesis and conclude that there is an association between wearing a helmet and head injury. STATA calculates the χ 2 a bit differently ( instead of from our hand- calculations). Same conclusion as the two-sample test of proportion. Contingency Table Example: Bike Helmets

21
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.21 Exact Tests Exact tests do not rely on the assumption of large samples (i.e., ok with <5 expected cell counts) Always use with small expected cell sizes Hypothesis Test: H 0 : no association H A : association Computes “exact” probability of observing the data in the given study, if no association was present. Does not describe the magnitude or the direction of the association Often called “Fishers exact test” for 2x2 tables. For more general dimensions, it is simply called an “exact test”.

22
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.22 Exact Tests Computationally intensive (particularly for large datasets) For this reason, it is historically been used as a back-up for the chi-square test when samples were small. However, given the power of today’s computers, this is a recommended primary analysis strategy (instead of a chi-square test) whenever possible.

23
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.23 Exact Test Example Where are different cars advertised?

24
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.24 Exact Test Example p<0.05 Significant difference between where cars are advertised.

25
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.25 Odds Ratio (OR) The Chi Square (and Exact) tests of association answers only the question of association. It does not comment on the magnitude or directionality of the association. OR is a measure of association indicating magnitude and direction. Commonly used in epidemiology Ranges from 0 to ∞. Approximates how much more likely (or unlikely) it is for the outcome to be present among those with “exposure” than those without exposure.

26
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.26 Odds Ratio (OR) Odds of having the disease if exposed: P(disease|exposed)/[1-P(disease|exposed)] Odds of having the disease if unexposed are: P(disease|unexposed)/[1-P(disease|unexposed)] The Odds Ratio (OR) is defined as:

27
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.27 Odds Ratio (OR) Consider the following 2 2 table: An estimate of the odds ratio is:

28
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.28 Odds Ratio (OR) Useful regardless of how data were collected. OR~RR when disease is rare RR: Relative Risk or Risk Ratio Ratio of the risk of developing a disease if exposed relative to the risk of developing a disease if unexposed

29
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.29 OR: Interpretation Example #1 Let y denote the presence (1) or absence (0) of lung cancer and x denote whether the person is a smoker (1=smoker, 0=non-smoker). An estimated odds ratio of 2 implies that lung cancer is twice as likely to occur among smokers than among nonsmokers in the study population.

30
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.30 OR: Interpretation Example #2 Let y denote the presence (1) or absence (0) of heart disease and x denote whether the person engages in regular strenuous physical exercise (1= exercise, 0= no exercise) An estimated odds ratio of 0.5 implies that heart disease is half as likely to occur among those who exercise than those who do not exercise.

31
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.31 Odds Ratio (OR) If the odds of having the disease in the exposed and unexposed groups are equal, then the odds ratio should be close to 1. A test of this is constructed as follows: H o : There is no association between exposure and disease H a : There is an association between exposure and disease. If the null hypothesis is true, the odds ratio should be close to 1. The test will answer the question: “How far from 1 is too far to warrant rejection of the null hypothesis?”

32
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.32 Odds Ratio (OR) The OR is not distributed normally! Skewed to the right… If the denominator is larger, OR in [0,1] If the numerator is larger, OR in [1,∞ ] 0 1 Estimated OR y } Equally likely

33
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.33 ln(Odds Ratio) Fortunately, the natural logarithm (ln) of the OR is distributed normally. In fact, the statistic ~ is approximately distributed according to the standard normal distribution. Ranges from - ∞ to ∞. Allows us to derived tests and confidence intervals as usual. We then convert back to the original scale using the exponential function.

34
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.34 Hypothesis Testing w/ OR

35
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.35 OR Confidence Intervals (1- α )% confidence interval of the log-odds ratio is given by Thus, the (1- α )% confidence interval of the true odds ratio is given by Note: This confidence interval can also be used to perform a hypothesis test by inspecting whether it covers 1 (the hypothesized OR value under the null hypothesis).

36
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.36 OR Example: Electronic Fetal Monitoring Consider data on use of EFM (Electronic Fetal Monitoring) and frequency of Caesarean birth deliveries. The data are as follows:

37
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.37 OR Example: Electronic Fetal Monitoring Our test of the null hypothesis of no association between EFM and Caesarean births is based on the statistic: Since Z=6.107>Z =1.96, we reject the null hypothesis. These data are consistent with a strong (positive) association between EFM and Caesarean births.

38
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.38 OR Example: Electronic Fetal Monitoring The 95% confidence interval is given by: which is consistent with the result of the test of hypothesis above (since 1 is not included in this interval). It is seen that the estimated odds ratio among women that were monitored via EFM, is from 44% higher to over double that of women that were not monitored by EFM.

39
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.39 OR Example: Electronic Fetal Monitoring In STATA: The odds ratio is 1.72 with a 95% confidence interval (1.447, 2.050). Thus, the null hypothesis of no association is rejected as both limits of the confidence interval are above 1.0.

40
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.40 OR Example: Coronary Heart Disease A study of Age and Coronary Heart Disease (CHD) OR = 8.1 & 95% CI = (2.9, 22.9) The study suggests that, CHD is 2.9 to 22.9 times more likely among those 55 or over than for those less than 55.

41
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.41 Sets of Contingency Tables: Intervening Variable We may have a scenario, whereby we have a contingency table for each level (stratum) of a third (potentially confounding) factor. Example: we develop a contingency table to examine the association between coffee consumption and myocardial infarction (MI). We gather these data for both smokers and non-smokers as smoking status may confound our results.

42
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.42 Interaction and Confounding Interaction (effect-modification): there is an interaction between x and y when the effect of y on z depends upon the level of x. Example: If the risk of smoking on developing lung cancer differs between males and females, then there is an interaction between smoking and gender.

43
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.43 Interaction and Confounding Confounding occurs when the effect of variable x on z is distorted when we fail to control for variable y. We say that y is a confounder for the effect of x on z. This is different from interaction.

44
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.44 Interaction and Confounding Note: It can happen that, when groups are combined, the overall OR is significantly different than the individual OR’s across groups – even if these OR’s are deemed “homogeneous” (Simpson’s Paradox). Examples: Baseball Batting Averages Electoral College Medical Studies

45
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.45 Sets of Contingency Tables: Coffee & Smoking Example Smokers Non-Smokers

46
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.46 The question that naturally arises is whether we should combine the information in those two tables and use all the available data in order to ascertain the effect of coffee on the risk of Myocardial Infarction (MI). However, if the association between coffee and MI were different in the group of smokers compared to the group of non- smokers (effect modification), then such an analysis would be inappropriate. Sets of Contingency Tables: Coffee & Smoking Example

47
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.47 We must first determine if the OR is homogeneous across stratum. This can be done with a hypothesis test: H 0 : OR is homogeneous across strata H A : OR is heterogeneous across strata This is equivalent to testing for a statistical “interaction”. Homogeneous ORs?

48
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.48 If OR are heterogeneous across strata then: There is an interaction between the third variable and the association. We need to perform a separate analyses by subgroup. Homogeneous ORs?

49
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.49 If ORs are not heterogeneous across strata then we may use the MH Odds Ratio Test. MH OR: Measure of association Controls for the potentially confounding effect of a third variable. Weighted average of individual OR’s (i.e., adjusted). We can obtain a CI, as well as perform hypothesis tests, for the MH OR. Homogeneous ORs?

50
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.50 We utilize Mantel-Haenszel Methods… Generalizing, we have g tables (i=1,...,g) that are constructed as follows (g=2 in the previous example) Mantel-Haenszel (MH) Methods

51
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.51 We employ the following strategy: 1.Analyze the two tables separately. Based on the individual estimates of the odds ratios. 2.Test the hypothesis that the odds ratios in the two subgroups are sufficiently close to each other (they are homogeneous). 3.a. If the assumption of homogeneity is not rejected then perform an overall (combined) “stratified” analysis. b. If the homogeneity assumption is rejected, then perform separate “subgroup” analyses (the association of the two factors is different in each subgroup). Mantel-Haenszel (MH) Methods

52
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.52 MH Test of Homogeneity The test of homogeneity is set-up as follows: 1.Ho: OR 1 =OR 2 (the two odds ratios do not have to be 1, just equal) 2.2. Ha: OR 1 ≠ OR 2 (only two-sided alternatives are possible with the chi square test) 3.The test statistic has an approximate i.e., a chi square distribution with g-1 degrees of freedom, where…

53
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.53 We use the individual odds ratios, producing a weighted average, weighing each of them inversely proportional to the square of their standard errors (one over their variance) to down-weight odds ratios with high variability. High variability means low information. MH Test of Homogeneity

54
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S Rejection rule. Reject the null hypothesis (conclude that the two subgroups are not homogeneous) if MH Test of Homogeneity

55
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.55 MH Test of Homogeneity: Coffee & Smoking Example Back to our smoking example:

56
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.56 MH Test of Homogeneity: Coffee & Smoking Example By the rejection rule, is not larger than any usual critical value (as seen in the Appendix). Thus, we do not reject the null hypothesis. No evidence for heterogeneity. It is appropriate to proceed with a combined, stratified analysis.

57
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.57 Combined OR: Coffee & Smoking Example The Summary Odds Ratio is a weighted average of the odds ratios for the g separate strata: So, after adjusting for smoking status, those who drink coffee have 2.18 times greater odds of experiencing nonfatal myocardial infarction compared to those who don’t drink coffee.

58
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.58 Combined OR: Confidence Interval The confidence intervals of the overall ratio are constructed similarly to the one-sample case. The only difference is the estimate of the overall odds ratio, and its associated standard error. In general a (1- α )% confidence interval based on the standard normal distribution is constructed as follows: where, and, and the w i are defined as before. Since Y=ln(OR), the (1- α )% confidence interval of the common odds ratio is:

59
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.59 Combined OR: Coffee & Smoking Example CI In the previous example, a 95% confidence interval is: Thus, at the 95% level of significance, coffee drinkers have from 73% higher risk for developing MI, to almost triple the risk, compared to non-coffee drinkers. Since this interval does not contain 1, this confidence interval implies that we should reject the null hypothesis of no (overall) association.

60
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.60 Finally, we test whether this summary odds ratio is equal to 1. The Mantel-Haenszel test is based on the chi square distribution and the simple idea that if there is no association between “exposure” and “disease”, then the number of exposed individuals a i contracting the disease should not be too different from: Mantel-Haenszel (MH) Test

61
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.61 Mantel-Haenszel (MH) Test

62
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.62 To see this, one must recall that under independence, the probability. If A=“Subject has the disease”, and B=“Subject is exposed” then. Mantel-Haenszel (MH) Test

63
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.63 Thus, under the assumption of independence (no association), A less obvious estimate of the variance of a i is: Mantel-Haenszel (MH) Test

64
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.64 The Mantel Haenszel test is constructed as follows: Mantel-Haenszel (MH) Test

65
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.65 MH Methods: Coffee & Smoking Example In the above example, a 1 =1,011, m 1 = 981.3, σ 2 1 =29.81, a 2 =383, m 2 =358.4, σ 2 2 = Thus, Since is much larger than 3.84 the 5% tail of the chi- square distribution with 1 degree of freedom, we reject the null hypothesis. It seems that coffee consumption has a significant effect on the risk of M.I. across smokers and non- smokers.

66
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.66 MH Methods: Coffee & Smoking Example STATA Output: Test of Homogeneity: p=0.334 (Note STATA chi-sq=0.933, slightly higher than our hand calculation of 0.896) M-H OR Test: p<0.001

67
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.67 MH Methods Summary : Coffee & Smoking Example 1.Analyzed the two tables separately. The odds ratio among smokers is 2.46, and among non-smokers is Then, based on the individual estimates of the odds ratios… 2. Tested the hypothesis that the odds ratios in the two subgroups are sufficiently close to each other (i.e., they are homogeneous). The test of homogeneity (“test for heterogeneity” in STATA) has a p-value >0.05. We do not reject the hypothesis of homogeneity in the two groups.

68
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.68 MH Methods Summary : Coffee & Smoking Example 3 a. Since the assumption of homogeneity was not rejected (p=0.334) we performed an overall (combined) analysis. From this analysis, the hypothesis of no association between coffee consumption and myocardial infarction is rejected (M-H p-value < ). Since this is the case, by inspection of the combined Mantel-Haenszel estimate of the odds-ratio (2.18) we see that the risk of coffee drinkers (adjusting for smoking status) is over twice as high as that of non- coffee drinkers.

69
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.69 One More OR Example: Low Birth Weight Low Birth Weight by Smoking Status, stratified by Race: WHITE OR = 19(40)/4(33) = 760/132 = 5.76 BLACK OR = 6(11)/5(4) = 66/20 = 3.30 OTHER OR = 5(35)/20(7) = 175/140 = 1.25

70
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.70 One More OR Example: Low Birth Weight We now have three groups, so using a chi- squared distribution with g-1=2 degrees of freedom we perform the test of homogeneity. X 2 H = p=0.221 Despite apparent differences in odds ratios between strata, they are within sampling variability of one another. Thus we can perform combined analyses. M-H Odd Ratio Estimate = 3.09

71
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.71 Logistic Regression Extends MH methods to include multiple variables Including continuous confounders and exposures Allows us to predict dichotomous outcomes Why can’t we simply use linear regression…?

72
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.72 Logistic Regression x y 0 1 Outcomes all y=1 or y=0

73
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.73 Logistic Regression x Estimated value of p 0 1 Linear model not appropriate! Predicted probabilities must stay between 0 and 1. Estimated values of P(Y=1)

74
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.74 Logistic Regression x Y (log odds) Transformed to linear model! ln(ODDS)

75
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.75 Logistic Regression Assumptions 1.Responses are Bernoulli 2.Parameters are linear on logit scale: Where p =P(Y=1)

76
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.76 Logistic Regression We can solve for p, proportion of times that the response variable, Y, takes on the value 1:

77
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.77 Logistic Regression Apply simple linear regression techniques, just interpret differently… β 0 = log(odds when x=0) e β0 = Odds Ratio (when x=0) β 1 = log(odds ratio) = log(odds in group 1) – log(odds in group 0) e β1 = Odds Ratio (between group 1 and group 0)

78
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.78 Logistic Regression Hypertension Example Study of the relationship between blood pressure and blood lead levels. Hypert=1 for hypertensive and 0 otherwise Sex=1 for males and 0 for females. Lead=1 for high blood lead levels and 0 for low blood lead levels.

79
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.79 Logistic Regression Hypertension Example Test of high vs. low blood lead levels .logistic hypert lead

80
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.80 Multiple Logistic Regression Extend simple logistic regression to include more than two variables. Both categorical and continuous predictors. Parallels methods for multiple linear regression Can estimate the effect of each variable while controlling for the effects of other (potentially confounding) variables in the model Indicator variables Interaction terms

81
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.81 Multiple Logistic Regression Hypertension Example cont. Now include both lead and sex in model: .logistic hypert sex lead

82
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D. and Lynne Peeples, M.S.82

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google