2 MCMC estimation in MlwiN MCMC estimation is a big topic and is given a pragmatic and cursory treatment here. Interested students are referred to the manual “MCMC estimation in MLwiN” available fromIn the workshop so far you have been using IGLS (Iterative Generalised Least Squares) algorithm to estimate the models using MQL and PQL approximations to handle discrete responses.
3 IGLS versus MCMC IGLS MCMC Fast to compute Slower to compute Deterministic convergence-easy to judgeStochastic convergence-harder to judgeUses mql/pql approximations to fit discrete response models which can produce biased estimates in some casesDoes not use approximations when estimating discrete response models, estimates are less biasedIn samples with small numbers of level 2 units confidence intervals for level 2 variance parameters assume Normality, which is inaccurate.In samples with small numbers of level 2 units Normality is not assumed when making inferences for level 2 variance parametersCan not incorporate prior informationCan incorporate prior informationHard to get uncertainty intervals around arbitrary functions of paramsEasy to get uncertainty intervals around arbitrary functions of paramsDifficult to extend to new modelsEasy to extend to new models
4 Bayesian frameworkMCMC estimation operates in a Bayesian framework. A bayesian framework requires one to think about prior information we have on the parameters we are estimating and to formally include that information in the model. We may make the decision that we are in a state of complete ignorance about the parameters we are estimating in which case we must specify a so called “uninformative prior”. The “posterior” distribution for a paremeter given that we have observed y is subject to the following rule:p(|y) p(y| )p()Wherep(|y) is the posterior distribution for given we have observed yp(y| ) is the likelihood of observing y given p() is the probability distribution arising from some statement of prior belief such as “we believe ~N(1,0.01)”. Note that “we believe ~N(1,1)” is a much weaker and therefore less influential statement of prior belief.
5 Applying MCMC to multilevel models Lets start with a ML Normal responseWe have the following unknownsLikelihood – “what the data says”-estimated from dataThere joint posterior isPosterior – final answers- a combination of likelihood and priorsPrior belief-supplied by the researcher
6 Gibbs samplingEvaluating the expression for the joint posterior with all the parameters unknown is for most models, virtually impossible. However, if we take each unknown parameter in turn and temporarily assume we know the values of the other parameters, then we can simulate directly from the so called “conditional posterior” distribution. The Gibbs sampling algorithm cycles through the following simulation steps. First we assume some starting values for our unknown parameters :
7 Gibbs sampling cnt’dWe now have updated all the unknowns in the model. This process is repeated many times until eventually we converge on the distribution of each of the unknown parameters.
8 IGLS vs MCMC convergence IGLS algorithm converges, deterministically to a distribution.MCMC algorithm converges on a distribution. Parameter estimates and intervals are then calculated from the simulation chains.
9 MCMC for discrete response models GIBBS sampling relies on being able to sample from the conditional posterior directly, in some models for some parameters the conditional posterior can not be arranged into a form that corresponds to a known distribution we can sample from directly. This is the case forIn such cases we need to use another type of MCMC sampling known as Metropolis-Hastings sampling
11 DIC and model comparison Deviance Information CriterionDIC is sum of two terms ‘fit’ + complexity or deviance + effective number of parametersWe want to maximise fit and minimize model complexityThis corresponds to lower deviance and lower effective number of parametersSo smaller DIC correspond to “better” models
12 To illustrate lets take a simple model Deviance= , Effective number of params = 1.97, DIC= =Actually effective number of parameters is really 2, but our estimate of effective number of parameters used in the DIC is very close.Why estimate the effective number of parameters?
13 Comparison of SL+ML with DIC Students are nested within 65 schools. If we fit a multilevel modelWhat is the effective number of parameters now? 66=(J-1) + intercept+slope?No. because uj are assumed to come from a distribution which places constraints on the values they can take, this means the effective number of parameters(number of independent parameters) will be less than 66.ML: Deviance= , Effective number of params = 53.96, DIC=SL: Deviance= , Effective number of params = 1.97, DIC=
14 Fitting schools with fixed effects “True” effective number of params is now 66 and estimated number is very close.ML(fixed effects): Deviance= , Effective number of params = 65.5, DIC=ML(random effects): Deviance= , Effective number of params = 53.96, DIC=SL: Deviance= , Effective number of params = 1.97, DIC=In terms of DIC ML(random effects) is “best” model
15 Other MCMC issuesBy default MLwiN uses flat, uniformative priors see page 5 of MCMC estimation in MLwiN (MEM)For specifying informative priors see chapter 6 of MEM.For model comparison in MCMC using the DIC statistic see chapters 3 and 4 MEM.For description of MCMC algorithms used in MLwiN see chapter 2 of MEM.
16 When to consider using MCMC in MLwiN If you have discrete response data – binary, binomial, multinomial or Poisson (chapters 11, 12, 20 and 21). Often PQL gives quick and accurate estimates for these models. However, it is a good idea to check against MCMC to test for bias in the PQL estimates.If you have few level 2 units and you want to make accurate inferences about the distribution of higher level variances.Some of the more advanced models in MLwiN are only available in MCMC. For example, factor analysis (chapter 19), measurement error in predictor variables (chapter 14) and CAR spatial models (chapter 16)Other models, can be fitted in IGLS but are handled more easily in MCMC such as multiple imputation (chapter 17), cross-classified(chapter 14) and multiple membership models (chapter 15).All chapter references to MCMC estimation in MLwiN.