Presentation is loading. Please wait.

Presentation is loading. Please wait.

Log-linear Models HRP 261 03/03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.

Similar presentations


Presentation on theme: "Log-linear Models HRP 261 03/03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti."— Presentation transcript:

1

2 Log-linear Models HRP 261 03/03/04

3 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti chapter 4). 2. Recall: log  =  +  x   = e  ( e  ) x A one-unit increase in X has a multiplicative impact of e  on . 3. General idea: predict the expected frequency (count) in each cell by a product of “effects”— main effects and interactions. 4. (Take logs to linearize).

4 Log-linear vs. logistic 1. The expected distribution of the categorical variables is Poisson, not binomial. 2. The link function is the log, not the logit. 3. Predictions are estimates of the cell counts in a contingency table, not the logit of y.

5 Log-linear vs. logistic The variables investigated by log linear models are all treated as “response variables.” Therefore, loglinear models only demonstrate association between variables (like chi-square or correlation coefficient). If clear explanatory and response variables exist, then logistic regression should be used instead. Also, if the variables are continuous and cannot be broken down into discrete categories, logistic regression is preferable.

6 Example: 3-way contingency Heart DiseaseTotal Body WeightSexYesNo Not over weightMale15520 Female4060100 Total 5565120 Over weightMale201030 Female104050 Total 305080 Source: Angela Jeansonne

7 In class exercise: Analyze these data using methods we have already learned. Is gender related to heart disease and is this effect modified or confounded by weight? What’s the relationship between overweight and gender (controlled for chd) and overweight and heart disease (controlled for gender)?

8 Heart DiseaseTotal SexYesNo All weightsMale351550 Female50100150 Total 85115200 Over weightMale201030 Female104050 Total 305080 OR male-CHD =35*100/(15*50)=4.66 Crude OR CHD-Male (ignore overweight)

9 Crude OR Overweight-Male (ignore heart disease) OverweightTotal SexYesNo All CHD-statusMale302050 Female50100150 Total 80120200 Over weightMale201030 Female104050 Total 305080 OR Overweight-Male =30*100/(20*50)=3.0

10 Crude OR CHD-Overweight (ignore gender) Heart DiseaseTotal WeightYesNo Men and Women combined Heavy305080 Light5565120 Total 85115200 Over weightMale201030 Female104050 Total 305080 OR CHD-Overweight =30*65/(50*55)=0.71

11 OR MH (CHD-Male) – stratified by Overweight

12 Stratified by Heart Disease OverweightTotal SexYesNo Heart DiseaseMale201535 Female104050 Total 305585 No CHDMale10515 Female4060100 Total 5065115

13 OR MH (Overweight-Male) – stratified by Heart Disease

14 Stratified by gender Heart DiseaseTotal GenderWeightYesNo MaleHeavy201030 Light15520 Total 351550 FemaleHeavy104050 Light4060100 Total 50100150

15 OR MH (CHD-Overweight) – stratified by Gender

16 Model with log-linear models

17 Model 1: Independence SAS CODE for generlized linear model with Poisson distribution and log link function: proc genmod data=loglinear; model total = Overweight IsMale HeartDis / dist=poisson link=log pred ; run; Model 1 (main effects only): Log (counts) =  +  overweight +  isMale +  HeartDisease Implies that the cell counts only depend on the MARGINAL probabilities (odds)

18 Independence model: parameters Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Intercept 1 3.9464 0.1170 3.7171 4.1758 1137.17 Overweight 1 -0.4055 0.1443 -0.6884 -0.1226 7.89 IsMale 1 -1.0986 0.1633 -1.4187 -0.7786 45.26 HeartDis 1 -0.3023 0.1430 -0.5826 -0.0219 4.47 Parameter Pr > ChiSq Intercept <.0001 Overweight 0.0050 IsMale <.0001 HeartDis 0.0346 Model 1: Log (counts) = 3.95 -.41 (weight) – 1.1 (male) -.30 (heart disease)

19 Interpretation of Parameters: Marginal Odds Model 1: Log (counts) = 3.95 -.41 (weight) – 1.1 (male) -.30 (heart disease) e -.41 = the (marginal) odds of being overweight =.66= 80/120 e -1.1 = the odds of being male =.33 = 50/150 e -0.3 = the odds of having disease=.74 = 85/115

20 Marginal probabilities P(overweight) =.66/(.66+1)=.40 (80/200) P(male)=.33/(.33+1)=.25 (50/200) P(heart disease)=.74/1.74=.425 (80/200) Predicted Counts As examples: The expected number of light men with heart disease = 200*(1-.40)(.25)(.425) under independence, or 12.75 The expected number of light men without disease = 200*(1-.40)(.25)(1-.425) under independence, or 17.25

21 Independence model: goodness-of-fit Cells Observed Pred light/male/disease 15 12.75 light/male/no disease 5 17.25 light/female/disease 40 38.25 light/female/no disease 60 51.75 heavy/male/disease 20 8.5 heavy/male/no disease 10 11.5 heavy/female/disease 10 25.5 heavy/female/no disease 40 34.5 df = cells – parameters in model=8-4 Suggests independen ce model is a poor fit!!

22 Predicted Table (note: marginal proportions don’t change) Heart DiseaseTotal Body WeightSexYesNo Not over weightMale12.7517.2530 Female38.2551.7590 Total 5169120 Over weightMale8.511.520 Female25.534.560 Total 344680

23 Predicted OR CHD-Male Heart DiseaseTotal SexYesNo All weightsMale21.2528.7550 Female63.7586.25150 Total 85115200 Over weightMale201030 Female104050 Total 305080 OR CHD-male =21.25*86.25/(28.75*63.75)=1.0

24 The model coefficients have an odds ratio interpretation…

25 Coefficients represent predicted counts in each cell Coefficients have a direct odds ratio interpretation Calculate OR CHD-Male in each Weight stratum This interpretation becomes more interesting/useful when interaction terms occur!

26 Expected OR CHD-Overweight Heart DiseaseTotal WeightYesNo All gendersHeavy 3446 80 Light5169120 Total 85115200 Over weightMale201030 Female104050 Total 305080 OR CHD-Overweight =34*69/(46*51)=1.0

27 Expected OR Overweight-Male OverweightTotal SexYesNo All CHD statusMale203050 Female6090150 Total 80120200 Over weightMale201030 Female104050 Total 305080 OR Overweight-Male =20*90/(60*30)=1.0

28 Model with Interaction: Model 2 (main effects + interaction with gender): This model corresponds to case when heart disease and overweight are conditionally independent (conditioned on gender). Log (counts) =  +  overweight +  isMale +  HeartDisease +  isMale *  HeartDisease +  isMale *  overweight proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight/ dist=poisson link=log pred ; run; Implies that gender is associated with heart disease and with overweight but overweight and heart disease are independent. OR CHD -Male  1 and OR Overweight-Male  1, but OR CHD-Overweight =1

29 Model 2: Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) + 1.1 (if overweight and male) Analysis Of Parameter Estimates Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept 1 4.1997 0.1155 3.9734 4.4260 Overweight 1 -0.6931 0.1732 -1.0326 -0.3537 IsMale 1 -2.4079 0.3317 -3.0580 -1.7579 HeartDis 1 -0.6931 0.1732 -1.0326 -0.3537 IsMale*HeartDis 1 1.5404 0.3539 0.8468 2.2341 Overweight*IsMale 1 1.0986 0.3367 0.4388 1.7584 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1322.81 <.0001 Overweight 16.02 <.0001 IsMale 52.71 <.0001 HeartDis 16.02 <.0001 IsMale*HeartDis 18.95 <.0001 Overweight*IsMale 10.65 0.0011

30 Interpretation of Parameters, Model 2 Model 2: Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) + 1.1 (if overweight and male)

31 OR estimate from predicted counts Cells Observed Pred light/male/disease 15 14 light/male/no disease 5 6 light/female/disease 40 33.3 light/female/no disease 60 66.6 heavy/male/disease 20 21 heavy/male/no disease 10 9 heavy/female/disease 10 16.6 heavy/female/no disease 40 33.3 OR CHD-Male is not confounded by weight

32 OR Overweight-Male Model 2: Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) + 1.1 (if overweight and male)

33 OR estimate from predicted counts Cells Observed Pred light/male/disease 15 14 light/male/no disease 5 6 light/female/disease 40 33.3 light/female/no disease 60 66.6 heavy/male/disease 20 21 heavy/male/no disease 10 9 heavy/female/disease 10 16.6 heavy/female/no disease 40 33.3 OR male-overweight is not confounded by chd

34 OR CHD-OVerweight Model 2: Log (counts) = 4.19 -.69 (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) + 1.1 (if overweight and male)

35 Interpretation: Model 2 Overweight and heart-disease are independent when you condition on gender. Heart Disease MenYesNo Overweight219 WomenOverweight16.633.3 normal33.366.6 normal 146 OR=21*6/14*9 =1.0 OR=16.6*33.3/33.3*33.3 =1.0

36 Model 3: only male and chd are related Output Model 3: Log (counts) = 4.09 -.41 (weight) – 1.9 (male) -.69 (heart disease) 1.54 (if male and heartdis) Model 2 (main effects + single interaction): This model corresponds to case when heart disease and overweight and gender and overweight are conditionally independent. Log (counts) =  +  overweight +  isMale +  HeartDisease +  isMale *  HeartDisease

37 OR: Male and CHD Model 3: Log (counts) = 4.09 -.41 (weight) – 1.9 (male) -.69 (heart disease) 1.54 (if male and heartdis)

38 Cells Observed Pred light/male/disease 15 21 light/male/no disease 5 9 light/female/disease 40 30 light/female/no disease 60 60 heavy/male/disease 20 14 heavy/male/no disease 10 6 heavy/female/disease 10 20 heavy/female/no disease 40 40 Model 3: only male and chd are related

39 Collapses to… CHD No CHD MaleFemale 3550 15 100

40 And… heart disease and overweight are independent, regardless of gender CHD No CHD Overweightlight 3451 46 69

41 And… overweight and gender are independent, regardless of disease Male Female Overweightlight 2030 60 90

42 M4: All pair-wise interactions proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight Overweight*HeartDis / dist=poisson link=log pred ; run; Model 4 (main effects +all pairwise interactions):  No pair of variables is conditionally independent. Log (counts) =  +  overweight +  isMale +  HeartDisease  isMale *  HeartDisease +  isMale *  overweight +  HeartDis *  overweight

43 Model 4: Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis) Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept 1 4.1103 0.1263 3.8627 4.3579 Overweight 1 -0.4458 0.1978 -0.8336 -0.0581 IsMale 1 -2.7153 0.3877 -3.4753 -1.9554 HeartDis 1 -0.4458 0.1978 -0.8336 -0.0581 IsMale*HeartDis 1 1.8213 0.3871 1.0627 2.5799 Overweight*IsMale 1 1.4456 0.3797 0.7013 2.1899 Overweight*HeartDis 1 -0.8239 0.3431 -1.4963 -0.1515 Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept 1058.30 <.0001 Overweight 5.08 0.0242 IsMale 49.04 <.0001 HeartDis 5.08 0.0242 IsMale*HeartDis 22.14 <.0001 Overweight*IsMale 14.49 0.0001 Overweight*HeartDis 5.77 0.0163

44 OR: Male and CHD Model 4: Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by overweight

45 OR: CHD and overweight Model 4: Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by gender

46 OR: male and overweight Model 4: Log (counts) = 4.11 -.25 (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) + 1.4 (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by chd

47 OR estimate from predicted counts Cells Observed Pred light/male/disease 15 16 light/male/no disease 5 4 light/female/disease 40 39 light/female/no disease 60 61 heavy/male/disease 20 19 heavy/male/no disease 10 11 heavy/female/disease 10 11 heavy/female/no disease 40 39 GOOD FIT!

48 The saturated model Model 5 (saturated): Log (counts) =  +  overweight +  isMale +  HeartDisease  isMale *  HeartDisease +  isMale *  overweight +  HeartDis *  overweight +  isMale *  HeartDisease *  overweight Perfect fit—but no degrees of freedom.


Download ppt "Log-linear Models HRP 261 03/03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti."

Similar presentations


Ads by Google