Log-linear Models HRP 261 03/03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.

Slides:



Advertisements
Similar presentations
Chapter 2 Describing Contingency Tables Reported by Liu Qi.
Advertisements

© Department of Statistics 2012 STATS 330 Lecture 32: Slide 1 Stats 330: Lecture 32.
Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Simple Logistic Regression
The Analysis of Categorical Data. Categorical variables When both predictor and response variables are categorical: Presence or absence Color, etc. The.
Logistic Regression Example: Horseshoe Crab Data
Loglinear Models for Independence and Interaction in Three-way Tables Veronica Estrada Robert Lagier.
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
Log-linear Analysis - Analysing Categorical Data
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
EPI 809 / Spring 2008 Final Review EPI 809 / Spring 2008 Ch11 Regression and correlation  Linear regression Model, interpretation. Model, interpretation.
Adjusting for extraneous factors Topics for today Stratified analysis of 2x2 tables Regression Readings Jewell Chapter 9.
Chi Square Test Dealing with categorical dependant variable.
Chapter 11 Survival Analysis Part 2. 2 Survival Analysis and Regression Combine lots of information Combine lots of information Look at several variables.
EPI 809/Spring Multiple Logistic Regression.
1 Modeling Ordinal Associations Section 9.4 Roanna Gee.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
Linear statistical models 2008 Count data, contingency tables and log-linear models Expected frequency: Log-linear models are linear models of the log.
WLS for Categorical Data
Adjusting for extraneous factors Topics for today More on logistic regression analysis for binary data and how it relates to the Wolf and Mantel- Haenszel.
Linear statistical models 2009 Count data  Contingency tables and log-linear models  Poisson regression.
Log-linear analysis Summary. Focus on data analysis Focus on underlying process Focus on model specification Focus on likelihood approach Focus on ‘complete-data.
C. Logit model, logistic regression, and log-linear model A comparison.
Logistic Regression Logistic Regression - Dichotomous Response variable and numeric and/or categorical explanatory variable(s) –Goal: Model the probability.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Logistic Regression and Generalized Linear Models:
AS 737 Categorical Data Analysis For Multivariate
Xuhua Xia Smoking and Lung Cancer This chest radiograph demonstrates a large squamous cell carcinoma of the right upper lobe. This is a larger squamous.
Measures of Regression and Prediction Intervals
Analysis of Categorical Data
Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda.
Logistic Regression III: Advanced topics Conditional Logistic Regression for Matched Data Conditional Logistic Regression for Matched Data.
Logistic Regression I HRP 261 2/09/04 Related reading: chapters and of Agresti.
Biostatistics Case Studies 2005 Peter D. Christenson Biostatistician Session 4: Taking Risks and Playing the Odds: OR vs.
April 11 Logistic Regression –Modeling interactions –Analysis of case-control studies –Data presentation.
LOG-LINEAR MODEL FOR CONTIGENCY TABLES Mohd Tahir Ismail School of Mathematical Sciences Universiti Sains Malaysia.
Logit model, logistic regression, and log-linear model A comparison.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Logistic Regression Database Marketing Instructor: N. Kumar.
1 היחידה לייעוץ סטטיסטי אוניברסיטת חיפה פרופ’ בנימין רייזר פרופ’ דוד פרג’י גב’ אפרת ישכיל.
When and why to use Logistic Regression?  The response variable has to be binary or ordinal.  Predictors can be continuous, discrete, or combinations.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
1 Topic 2 LOGIT analysis of contingency tables. 2 Contingency table a cross classification Table containing two or more variables of classification, and.
Joyful mood is a meritorious deed that cheers up people around you like the showering of cool spring breeze.
1 STA 517 – Chp4 Introduction to Generalized Linear Models 4.3 GENERALIZED LINEAR MODELS FOR COUNTS  count data - assume a Poisson distribution  counts.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
1 Topic 4 : Ordered Logit Analysis. 2 Often we deal with data where the responses are ordered – e.g. : (i) Eyesight tests – bad; average; good (ii) Voting.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Chi Square & Correlation
Copyright © 2015, 2012, and 2009 Pearson Education, Inc. 1 Chapter Correlation and Regression 9.
1 Introduction to Modeling Beyond the Basics (Chapter 7)
Probability and odds Suppose we a frequency distribution for the variable “TB status” The probability of an individual having TB is frequencyRelative.
1 Fighting for fame, scrambling for fortune, where is the end? Great wealth and glorious honor, no more than a night dream. Lasting pleasure, worry-free.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
Analysis of matched data Analysis of matched data.
Generalized Linear Models
Introduction to logistic regression a.k.a. Varbrul
SA3202 Statistical Methods for Social Sciences
Joyful mood is a meritorious deed that cheers up people around you
Presentation transcript:

Log-linear Models HRP /03/04

Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti chapter 4). 2. Recall: log  =  +  x   = e  ( e  ) x A one-unit increase in X has a multiplicative impact of e  on . 3. General idea: predict the expected frequency (count) in each cell by a product of “effects”— main effects and interactions. 4. (Take logs to linearize).

Log-linear vs. logistic 1. The expected distribution of the categorical variables is Poisson, not binomial. 2. The link function is the log, not the logit. 3. Predictions are estimates of the cell counts in a contingency table, not the logit of y.

Log-linear vs. logistic The variables investigated by log linear models are all treated as “response variables.” Therefore, loglinear models only demonstrate association between variables (like chi-square or correlation coefficient). If clear explanatory and response variables exist, then logistic regression should be used instead. Also, if the variables are continuous and cannot be broken down into discrete categories, logistic regression is preferable.

Example: 3-way contingency Heart DiseaseTotal Body WeightSexYesNo Not over weightMale15520 Female Total Over weightMale Female Total Source: Angela Jeansonne

In class exercise: Analyze these data using methods we have already learned. Is gender related to heart disease and is this effect modified or confounded by weight? What’s the relationship between overweight and gender (controlled for chd) and overweight and heart disease (controlled for gender)?

Heart DiseaseTotal SexYesNo All weightsMale Female Total Over weightMale Female Total OR male-CHD =35*100/(15*50)=4.66 Crude OR CHD-Male (ignore overweight)

Crude OR Overweight-Male (ignore heart disease) OverweightTotal SexYesNo All CHD-statusMale Female Total Over weightMale Female Total OR Overweight-Male =30*100/(20*50)=3.0

Crude OR CHD-Overweight (ignore gender) Heart DiseaseTotal WeightYesNo Men and Women combined Heavy Light Total Over weightMale Female Total OR CHD-Overweight =30*65/(50*55)=0.71

OR MH (CHD-Male) – stratified by Overweight

Stratified by Heart Disease OverweightTotal SexYesNo Heart DiseaseMale Female Total No CHDMale10515 Female Total

OR MH (Overweight-Male) – stratified by Heart Disease

Stratified by gender Heart DiseaseTotal GenderWeightYesNo MaleHeavy Light15520 Total FemaleHeavy Light Total

OR MH (CHD-Overweight) – stratified by Gender

Model with log-linear models

Model 1: Independence SAS CODE for generlized linear model with Poisson distribution and log link function: proc genmod data=loglinear; model total = Overweight IsMale HeartDis / dist=poisson link=log pred ; run; Model 1 (main effects only): Log (counts) =  +  overweight +  isMale +  HeartDisease Implies that the cell counts only depend on the MARGINAL probabilities (odds)

Independence model: parameters Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Intercept Overweight IsMale HeartDis Parameter Pr > ChiSq Intercept <.0001 Overweight IsMale <.0001 HeartDis Model 1: Log (counts) = (weight) – 1.1 (male) -.30 (heart disease)

Interpretation of Parameters: Marginal Odds Model 1: Log (counts) = (weight) – 1.1 (male) -.30 (heart disease) e -.41 = the (marginal) odds of being overweight =.66= 80/120 e -1.1 = the odds of being male =.33 = 50/150 e -0.3 = the odds of having disease=.74 = 85/115

Marginal probabilities P(overweight) =.66/(.66+1)=.40 (80/200) P(male)=.33/(.33+1)=.25 (50/200) P(heart disease)=.74/1.74=.425 (80/200) Predicted Counts As examples: The expected number of light men with heart disease = 200*(1-.40)(.25)(.425) under independence, or The expected number of light men without disease = 200*(1-.40)(.25)(1-.425) under independence, or 17.25

Independence model: goodness-of-fit Cells Observed Pred light/male/disease light/male/no disease light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease heavy/female/disease heavy/female/no disease df = cells – parameters in model=8-4 Suggests independen ce model is a poor fit!!

Predicted Table (note: marginal proportions don’t change) Heart DiseaseTotal Body WeightSexYesNo Not over weightMale Female Total Over weightMale Female Total

Predicted OR CHD-Male Heart DiseaseTotal SexYesNo All weightsMale Female Total Over weightMale Female Total OR CHD-male =21.25*86.25/(28.75*63.75)=1.0

The model coefficients have an odds ratio interpretation…

Coefficients represent predicted counts in each cell Coefficients have a direct odds ratio interpretation Calculate OR CHD-Male in each Weight stratum This interpretation becomes more interesting/useful when interaction terms occur!

Expected OR CHD-Overweight Heart DiseaseTotal WeightYesNo All gendersHeavy Light Total Over weightMale Female Total OR CHD-Overweight =34*69/(46*51)=1.0

Expected OR Overweight-Male OverweightTotal SexYesNo All CHD statusMale Female Total Over weightMale Female Total OR Overweight-Male =20*90/(60*30)=1.0

Model with Interaction: Model 2 (main effects + interaction with gender): This model corresponds to case when heart disease and overweight are conditionally independent (conditioned on gender). Log (counts) =  +  overweight +  isMale +  HeartDisease +  isMale *  HeartDisease +  isMale *  overweight proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight/ dist=poisson link=log pred ; run; Implies that gender is associated with heart disease and with overweight but overweight and heart disease are independent. OR CHD -Male  1 and OR Overweight-Male  1, but OR CHD-Overweight =1

Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male) Analysis Of Parameter Estimates Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept Overweight IsMale HeartDis IsMale*HeartDis Overweight*IsMale Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept <.0001 Overweight <.0001 IsMale <.0001 HeartDis <.0001 IsMale*HeartDis <.0001 Overweight*IsMale

Interpretation of Parameters, Model 2 Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male)

OR estimate from predicted counts Cells Observed Pred light/male/disease light/male/no disease 5 6 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease 10 9 heavy/female/disease heavy/female/no disease OR CHD-Male is not confounded by weight

OR Overweight-Male Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male)

OR estimate from predicted counts Cells Observed Pred light/male/disease light/male/no disease 5 6 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease 10 9 heavy/female/disease heavy/female/no disease OR male-overweight is not confounded by chd

OR CHD-OVerweight Model 2: Log (counts) = (weight) – 2.4 (male) -.69 (heart disease) 1.54 (if male and heartdis) (if overweight and male)

Interpretation: Model 2 Overweight and heart-disease are independent when you condition on gender. Heart Disease MenYesNo Overweight219 WomenOverweight normal normal 146 OR=21*6/14*9 =1.0 OR=16.6*33.3/33.3*33.3 =1.0

Model 3: only male and chd are related Output Model 3: Log (counts) = (weight) – 1.9 (male) -.69 (heart disease) 1.54 (if male and heartdis) Model 2 (main effects + single interaction): This model corresponds to case when heart disease and overweight and gender and overweight are conditionally independent. Log (counts) =  +  overweight +  isMale +  HeartDisease +  isMale *  HeartDisease

OR: Male and CHD Model 3: Log (counts) = (weight) – 1.9 (male) -.69 (heart disease) 1.54 (if male and heartdis)

Cells Observed Pred light/male/disease light/male/no disease 5 9 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease 10 6 heavy/female/disease heavy/female/no disease Model 3: only male and chd are related

Collapses to… CHD No CHD MaleFemale

And… heart disease and overweight are independent, regardless of gender CHD No CHD Overweightlight

And… overweight and gender are independent, regardless of disease Male Female Overweightlight

M4: All pair-wise interactions proc genmod data=loglinear; model total = Overweight IsMale HeartDis isMale*HeartDis isMale*Overweight Overweight*HeartDis / dist=poisson link=log pred ; run; Model 4 (main effects +all pairwise interactions):  No pair of variables is conditionally independent. Log (counts) =  +  overweight +  isMale +  HeartDisease  isMale *  HeartDisease +  isMale *  overweight +  HeartDis *  overweight

Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Standard Wald 95% Parameter DF Estimate Error Confidence Limits Intercept Overweight IsMale HeartDis IsMale*HeartDis Overweight*IsMale Overweight*HeartDis Analysis Of Parameter Estimates Chi- Parameter Square Pr > ChiSq Intercept <.0001 Overweight IsMale <.0001 HeartDis IsMale*HeartDis <.0001 Overweight*IsMale Overweight*HeartDis

OR: Male and CHD Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by overweight

OR: CHD and overweight Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by gender

OR: male and overweight Model 4: Log (counts) = (weight) – 2.7 (male) -.45 (heart disease) 1.8 (if male and heartdis) (if overweight and male)-.82 (if over and heartdis) Corresponds to the M-H summary OR, stratified by chd

OR estimate from predicted counts Cells Observed Pred light/male/disease light/male/no disease 5 4 light/female/disease light/female/no disease heavy/male/disease heavy/male/no disease heavy/female/disease heavy/female/no disease GOOD FIT!

The saturated model Model 5 (saturated): Log (counts) =  +  overweight +  isMale +  HeartDisease  isMale *  HeartDisease +  isMale *  overweight +  HeartDis *  overweight +  isMale *  HeartDisease *  overweight Perfect fit—but no degrees of freedom.