October 20, 2009 Session 8Slide 1 PSC 5940: Estimating the Fit of Multi-Level Models Session 8 Fall, 2009
October 20, 2009 Session 8Slide 2 Log-likelihood - GLMs Given a linear model: With possibly correlated errors: The log-likelihood is defined as: Given a linear model, differentiating the Generalized Sum of Squares wrt β, setting partial derivative to zero, and solving for β, produces: Allows errors to be correlated Look familiar?
October 20, 2009 Session 8Slide 3 Understanding MLE For OLS, the the MLE is the SSE when β=(X’X) -1 (X’Y) For log models, the MLS is the product of the (logged) errors given the formula for the Generalized Sum of Squares The latter differs from OLS as various assumptions are relaxed (nonlinearity, correlated errors, etc) The results of the likelihood function can be viewed topographically, as a “hill” showing the effect on the LE as you vary the estimated coefficients. The maximum-LE will be the peak of the hill. The “fit” measures (adj R 2, AIC, BIC) are telling you how high (or low) the peak is for a given model. Comparisons of the fit measures across models can assist in model selection
October 20, 2009 Session 8Slide 4 Measures of Model Fit R 2 and adj R 2 : AIC: Where is the maximized log- likelihood for the model AIC penalizes for the complexity of the model BIC: Where n is the number of observations BIC penalizes for increased model complexity and sample size (results in preference for simpler models)
October 20, 2009 Session 8Slide 5 BIC Test for “Improvement in Model Fit” BIC Difference (M 2 -M 1 ) Evidence for “More Complex” Model 0-2“Weak” 2-6“Positive” 6-10“Strong” >10“Conclusive”
October 20, 2009 Session 8Slide 6 Example in R: LM1 Predicting votes in referendum on alternative energy tax (erdf100<-e63_erdf) by price and region: Explanatory variables: Randomly assigned values: ($6 to $2400 p/y) price<-random_p Region (already so named) ML1<-lmer(erdf100~1+(1|price)+(1|region))
October 20, 2009 Session 8Slide 7 Example in R: LM2 Predicting votes in referendum on alternative energy tax (erdf100<-e63_erdf) by price, region and perceived risks posed by GCC: Explanatory variables: Randomly assigned values: ($6 to $2400 p/y) price<-random_p Region (already so named) Risk_GCC (0-10 scale) ML2<- lmer(erdf100~1+risk_GCC+(1|price)+(1|region))
October 20, 2009 Session 8Slide 8 LM1 Result: > summary(ML1) Linear mixed model fit by REML Formula: erdf100 ~ 1 + (1 | price) + (1 | region) AIC BIC logLik deviance REMLdev Random effects: Groups Name Variance Std.Dev. price (Intercept) region (Intercept) Residual Number of obs: 1546, groups: price, 15; region, 5 Fixed effects: Estimate Std. Error t value (Intercept)
October 20, 2009 Session 8Slide 9 LM2 Result: Linear mixed model fit by REML Formula: erdf100 ~ 1 + risk_gcc + (1 | price) + (1 |region) AIC BIC logLik deviance REMLdev Random effects: Groups Name Variance Std.Dev. price (Intercept) region (Intercept) Residual Number of obs: 1532, groups: price, 15; region, 5 Fixed effects: Estimate Std. Error t value (Intercept) risk_gcc
October 20, 2009 Session 8Slide 10 BIC Test Results: BIC for ML1 – BIC for ML2 = – = 309; “Conclusive” Note: You can have R calculate the difference: BIC(logLik(ML2))-BIC(logLik(ML1))
October 20, 2009 Session 8Slide 11 Model fit: BIC and Thinking Use of BIC is often (ill)used like a statistical “idiot light” Depends on sample employed Maximizes predictive capacity of model rather than model explanation When you face a decision of whether to add an explanatory variable: use a 2-step process Does the variable make theoretical sense? Does BIC show improved model fit? If answers to both are “yes”, then add the variable
October 20, 2009 Session 8Slide 12 BREAK
October 20, 2009 Session 8Slide 13 R Coding When modeling with only part of your data, use “subset” lmer(y~x…, subset=state==“NY”)
October 20, 2009 Session 8Slide 14 Workshop New developments on Models? Progress on Papers Research question motivated by literature reviews
October 20, 2009 Session 8Slide 15 Next Week Focus is on paper progress Build timelines for completion Focus on challenges, and what we need to do to surmount them Hone your research question: motivated by literature reviews Need 1-page progress reports, including task assignments