Download presentation

Presentation is loading. Please wait.

Published bySebastian Edwards Modified over 3 years ago

1
Model checking in mixture models via mixed predictive p-values Alex Lewin and Sylvia Richardson, Centre for Biostatistics, Imperial College, London Mixed predictive distribution The hierarchical model has parameters for each individual g (at the 2nd and 3rd levels), and global parameters (at the 3rd and 4th levels). Mixed predictive data: (1) predict new 2nd level parameters conditional on the 3rd level parameters in the model, (2) predict new data conditional on the new 2nd level parameters. Mixed predicted data for each individual has reduced dependence on the observed data for that individual, as the new data is sampled conditional on the global hyperparameters (posterior predictive data is sampled conditional on individual parameters). Therefore the mixed predictive p- values are less conservative than posterior predictive p-values. Calculation of p-values is simple: model is run with Monte-Carlo Markov Chain (MCMC). Sample predictive parameters and data from distributions specified in model, count how many times predicted test statistic is larger than observed test statistic. Mixed predictive checks have been used to check other aspects of 2nd level distributions (Lewin et al. 2006). Choice of parameters to predict main parameter (corresponds to test statistic) results similar whether or not this is also predicted important not to predict this (want to look at each mixture component separately) Introduction We are concerned with model checking for complex Bayesian hierarchical models, using predictive distributions. A common choice is the posterior predictive. Model checks using this are conservative, as predicted data is highly dependent on observed data. We use the mixed predictive (Gelman et al 1996), which is less conservative (Marshall & Spiegelhalter 2003). We focus our checks on 2nd level parameters, specificially parameters whose distribution is defined as a mixture. It is at this level that sensitivity to model assumptions is most expected and hardest to check directly. Mixed predictive p-values for mis-specified model Investigate behaviour of predictive p- values under a mis-specified model: Simulate data from mixture of Uniforms (all other parameters as before). Reduced conservativeness Investigate behaviour of predictive p-values under the null: simulate data from the model we fit. 1000 individuals (g=1,…,1000), 8 repeats (i=1,…,8). Mixed predictive p-values much closer to Uniform than posterior predictive p-values. References Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior Predictive Assessment of Model Fitness via Realized Discrepancies. Statistica Sinica 6, 733-807. Marshall, E. C. and Spiegelhalter, D. J. (2003). Approximate cross-validatory predictive checks in disease mapping models. Statistics in Medicine 22, 1649-1660. Lewin, A., Richardson, S., Marshall C., Glazier A. and Aitman T. (2006). Bayesian Modelling of Differential Gene Expression. Biometrics, 62, 1-9 Our approach to model-checking Aspects of Model - 1000s of individuals modelled in parallel, exchangeably - assumptions made on model structure (see below for mixture model) - no strong prior information on model parameters Model Checks - aim to check each mixture component separately - obtain measure of fit for each individual - compare predicted distributions with observed data using Bayesian p-values - assess Uniformity of p-values using histograms and q-q plots - use mixed predictive distribution (see below) Mixed predictive checks Red shows the model fitted. Green shows the posterior predictive quantities. Blue shows the mixed predictive quantities (new parameters are predicted within the model). Mixed Prediction Posterior Prediction δgδg δ g pr ed zgzg mixed pred. x gi post. pred. x gi g α, βη π obs. x gi Mixture model q-q plots of p-values for the 3 mixture components. Note small numbers of individuals in the 2 outer components. p-values for genes with strong inference on mixture component: results are much more Uniform Mixed predictive p-values for separate mixture components Define p-values conditional on membership of mixture component: These p-values are a mixture of Uniform (individuals assigned to the correct mixture component) and Non-Uniform (individuals assigned to the wrong component). Discussion. For real data, true model does not exist. Need criterion to judge acceptable departures from Uniformity. Model checks for mixtures should consider both marginal and conditional predictions. Mixed predictive checking is a sensitive tool for highlighting mis-specification

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google