Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 9 Model Comparison using MCMC and further models.

Similar presentations

Presentation on theme: "Lecture 9 Model Comparison using MCMC and further models."— Presentation transcript:

1 Lecture 9 Model Comparison using MCMC and further models

2 Lecture Contents Model comparison DIC diagnostic Random slopes regression model Priors for variance matrices MLwiN RSR demonstration Other predictor variables DIC in WinBUGS

3 Bayesian Model Comparison In Bayesian statistics model comparison is a thorny issue!! In MLwiN we used to suggest running IGLS for model selection then MCMC on your chosen model. Why is it a thorny issue? The posterior f(θ|Y) does not allow criticism of the model in light of the observed data nor comparison amonst models. It is f(Y) that can be used to assess model performance. Regardless of the model, f(Y) is a density over the space of observables which can be compared with what was actually observed.

4 Bayes Factors If we observe Y OBS and have 2 models M 1 and M 2 then the Bayes Factor is This provides the relative weight of evidence for model M 1 compared to model M 2. Rough calibration of the Bates factor has been proposed: BF< >150 Evidence-veweak+veStrongV. Strong

5 Problems with Bayes Factor 1. When prior is vague -> f(θ) is improper This implies that even though f(θ |Y) may be proper, f(Y) is improper so Bayes Factors cannot be used! 2. Computation of the Bayes factor itself requires high-dimensional integration. 3. Lindleys paradox – data points to rejection but prior is diffuse so denominator of Bayes factor much smaller than numerator and too much weight given to parsimonious models.

6 Other related ideas Prior predictive distributions f(Y). Cross-validation predictive distributions F(y r |Y (r) ). Posterior predictive distributions f(Y|Y obs ). Model uncertainty – where the model is itself a parameter to be estimated. Bayesian model averaging. Reversible jump MCMC.

7 Model Comparison for random effect models As we will typically use diffuse priors, Bayes factors are not an option here. The methods listed previously are possibilities but not built into software packages. The Deviance Information Criterion (DIC) is one possibility but is it a saviour for Bayesian model choice or a white elephant?

8 DIC – Spiegelhalter et al. (2002) Plus points: 1.Discussion paper proposing it written by leading figures in Bayesian modelling. 2.Available in both MLwiN and WinBUGS for standard models Minus points: The paper was given a very mixed reception at the RSS when it was discussed!

9 DIC A natural way to compare models is to use a criterion based on a trade-off between the fit of the data to the model and the corresponding complexity of the model. DIC does this in a Bayesian way. DIC = goodness of fit + complexity. Fit is measured by deviance Complexity is measured by an estimate of the effective number of parameters defined as i.e. Posterior mean deviance minus the deviance evaluated at the posterior mean of the parameters.

10 DIC (continued) The DIC is then defined analagously to AIC as Models with smaller DIC are better supported by the data. DIC can be monitored in WinBUGS from Inference/DIC menu. DIC is available in MLwiN under the Model/MCMC menu.

11 Education dataset We can fit a simple (Bayesian) linear regression in MLwiN The DIC output is as follows: Param Dbar D(thetabar) PDPD 3.02 DIC Note PD ~ 3 = the actual number of parameters

12 Variance components model Here we consider the random intercepts model from earlier practicals This is the parallel lines model

13 Change in DIC ModelDbarDthetabarPDPD DIC Regression VC Here we see the clear improvement in fitting random effects for school. Note that the effective number of parameters is ~60 compared with 68 actual parameters in the dataset due to random rather than fixed school effects.

14 Random slopes model (crossing lines) x 1ij school 2 school 1 u 1,1 u 0,2 u 1,2 u 0,1

15 Fitting an RSR in a Bayesian Framework The basic random slopes regression model is as follows: To this model we need to add priors for

16 Wishart priors For a (kxk) variance matrix parameter in a Normal likelihood the conjugate prior is the inverse Wishart distribution with parameters ν and S This distribution looks complex but is simply a multivariate generalisation of the inverse Gamma distribution.

17 Wishart prior for Ω u -1 In MLwiN we use an inverse Wishart prior for the precision matrix: Note this is a (weakly informative) prior as the first parameter represents the prior sample size and is set to the smallest feasible value. Browne and Draper have looked at alternative Wishart priors as well as a Uniform prior and performed simulations.

18 Gibbs Sampling algorithm for RSR model Repeat the following four steps 1. Generate β from its (Multivariate) Normal conditional distribution. 2. Generate each u j from its (Multivariate) Normal conditional distribution. 3. Generate Ω u -1 from its Wishart conditional distribution. 3. Generate 1/σ e 2 from its Gamma conditional distribution

19 Bayesian RSR Model for education dataset Note IGLS estimates used in prior. Variance (posterior mean) estimates bigger than IGLS estimates.

20 DIC for RSR model ModelDbarDthetabarPDPD DIC RSR VC As with the frequentist approach the random slopes model is an improvement over the random intercepts model. The additional 65 random parameters only add 32 effective parameters

21 Trajectories for the RSR model

22 MCMC Diagnostics for Ω u00

23 Predictions for the RSR model with highlighted data Here the top and bottom school are highlighted:

24 Residuals for the RSR Individually: and pairwise:

25 Uniform Priors Here the level 2 variance estimates increase as in Browne and Draper (2000) Browne and Draper found that the Wishart priors were preferable although the use of the IGLS estimate is not strictly Bayesian as we are using the data twice!

26 Other predictors in the education dataset This dataset has other predictors such as gender and school gender that can be considered in the practical. In the next slide we see the equations window for a model with these added which has DIC a reduction of over 25 on the earlier RSR model

27 RSR + gender effects

28 WinBUGS RSR & gender model { # Level 1 definition for(i in 1:N) { normexam[i] ~ dnorm(mu[i],tau) mu[i]<- beta[1] * cons[i] + beta[2] * standlrt[i] + beta[3] * girl[i] + beta[4] * boysch[i] + beta[5] * girlsch[i] + u2[school[i],1] * cons[i] + u2[school[i],2] * standlrt[i] } # Higher level definitions for (j in 1:n2) { u2[j,1:2] ~ dmnorm(zero2[1:2],tau.u2[1:2,1:2]) } # Priors for fixed effects for (k in 1:5) { beta[k] ~ dflat() } # Priors for random terms tau ~ dgamma( , ) sigma2 <- 1/tau for (i in 1:2) {zero2[i] <- 0} tau.u2[1:2,1:2] ~ dwish(R2[1:2, 1:2],2) sigma2.u2[1:2,1:2] <- inverse(tau.u2[,]) } Here we see the WiNBUGS code for our last model. Notice how MVN and Wishart distributions are specified in WinBUGS

29 DIC in WinBUGS In WinBUGS DIC is available from the Inference menu: The DIC is set after the burnin and then the DIC button is pressed after running giving: Dbar = post.mean of -2logL; Dhat = -2LogL at post.mean of stochastic nodes Dbar Dhat pDDIC Normexam total

30 Parameter estimates in WinBUGS Note that here we see that WinBUGS gives similar estmates as MLwiN for the model. Note that for the fixed effects β that WinBUGS indexes from 1 while MLwiN indexes from 0. node mean sd (2.5%, 97.5%) beta[1] ( , ) beta[2] (0.5145, ) beta[3] (0.1019, ) beta[4] ( , ) beta[5] ( , ) sigma (0.5272,0.576) sigma2.u2[1,1] ( , ) sigma2.u2[1,2] ( ,0.0372) sigma2.u2[2,2] ( , )

31 Next Practical The next practical is free ranging: You can follow the MLwiN chapter on RSR models that is given. You can try out RSR models in WinBUGS. You can try out fitting random effect models to the orthodont dataset using MCMC. You can try out DIC on other models.

Download ppt "Lecture 9 Model Comparison using MCMC and further models."

Similar presentations

Ads by Google