A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data Poisson regression (2) binary data logistic regression
A priori violations In the following cases, your data violates the normality and homoskedasticity assumption on a priori grounds: (1) count data Poisson regression (2) binary data logistic regression
Output example > summary(xglm) Call: glm(formula = error ~ alc, family = "binomial") Deviance Residuals: Min 1Q Median 3Q Max -2.1073 -0.5495 -0.3257 0.7173 1.8070 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) -3.643 1.123 -3.244 0.001179 ** alc 16.118 4.856 3.319 0.000903 ***
The linear model Y ~ b0 + b1*X1 + b2*X2 so this is how we expand this… do this on the blackboard …. make a list with X1 = gender, and X2 = focus or no focus …and then 0 times, 1 times etc.
p(Y) ~ logit-1(b0 + b1*X1 + b2*X2) The logistic model p(Y) ~ logit-1(b0 + b1*X1 + b2*X2) linear predictor so this is how we expand this… do this on the blackboard …. make a list with X1 = gender, and X2 = focus or no focus …and then 0 times, 1 times etc.
Representative values Probability Odds Log odds (= “logits”) 0.1 0.111 -2.197 0.2 0.25 -1.386 0.3 0.428 -0.847 0.4 0.667 -0.405 0.5 1 0.6 1.5 0.405 0.7 2.33 0.847 0.8 4 1.386 0.9 9 2.197 - So a probability of 80% of an event occurring means that the odds are “4 to 1” for it occurring What happens if the odds are 50 to 50? -> ratio is 1 If the probability of non-occurrence is higher than occurrence, fractions If the probability of occurrence is higher, positive numbers
Snijders & Bosker (1999: 212)
= inverse logit function plogis()
Estimate Std. Error z value Pr(>|z|) (Intercept) -3. 643 1. 123 -3 Estimate Std. Error z value Pr(>|z|) (Intercept) -3.643 1.123 -3.244 0.001179 ** alc 16.118 4.856 3.319 0.000903 *** for probabilities: transform the entire LP with the logistic function for odds: transform individual predictors with exp(x) plogis()
General Linear Model Generalized Linear Model
= “Generalizing” the General Linear Model to cases that don’t include continuous response variables (in particular categorical ones) = Consists of two things: (1) an error distribution, (2) a link function Generalized Linear Model
= “Generalizing” the General Linear Model to cases that don’t include continuous response variables (in particular categorical ones) = Consists of two things: (1) an error distribution, (2) a link function Logistic regression: Binomial distribution Poisson regression: Poisson distribution lm(response ~ predictor) glm(response ~ predictor, family=”binomial”) glm(response ~ predictor, family=”poisson”) Logistic regression: Logit link function Poisson regression: Log link function
Simple linear regression & multiple regression = generalized linear model with normal error structure and identity link function