Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993
Abstract The development of discrete mixture distributions as approximations to priors and posteriors in Bayesian analysis –Adaptive density estimation
Adaptive mixture modeling p ( ) : the continuous posterior density function for a continuous parameter vector . g ( ) : approximating density for importance sampling function. –T-distribution = { j, j =1,…, n } : random sample from g ( ). = { w j, j =1,…, n } : weights –w j = p ( )/( k g ( )) –k =
Importance sampling and mixture Univariate random sampling –Direct Bayesian interpretations (based on mixtures of Dirichlet processes) Multivariate kernel estimation –Weighted kernel estimator
Adaptive methods of posterior approximation Possible patterns of local dependence exhibited by p ( ) –Easy Different regions of parameter space are associated with rather different patterns of dependence. – V is varying with local j and more heavily depending on j.
Adaptive importance sampling The importance sampling distribution is sequently revised based on information derived from successive Monte Carlo samples.
AIS algorithm 1.Choose an initial importance sampling distribution with density g 0 ( ), draw a small sample n 0 and compute weights, deducing the summary 0 = { g 0, n 0, 0, 0 }. Compute the Monte Carlo estimates and V 0 of the mean and variance of p 0 2.Construct a revised importance function g 1 ( ) using (1) with sample size n 0, points 0,j, weights w 0,j, and variance matrix V 0 3.Draw a larger sample of size n 1 from g 1 ( ), and replace 0 with 1 4.Either stop, and base inferences on 1, or proceed, if desired, to a further revised version g 2 ( ), constructed similarly.
Approximating mixtures by mixtures The computational burden increases if further refinement with larger sample sizes. –Solution) Using a mixtures of several thousand T Reducing the number of components by replacing ‘nearest neighboring’ components with some form of average
Clustering routine 1.Set r = n, starting with the r = n component mixture, choose k < n as the number of components for the final, reduced mixture. 2.Sort r values of j. in in order of increasing values of weights w j in 3.Find the index i such that j. is the nearest neighbor of 1, and reduce the sets and to sets of size r –1 by removing components 1 and i, and inserting ‘average’ values
4.Proceed to (2), stopping here only when r = k 5.The resulting mixture, the locations based on the final k averaged values, with associated combined weights, the same scale matrix V but new, and larger, window-width h based on the current, reduced ‘sample size’ r rather than n
Sequential updating and dynamic models Updating a prior to posterior distribution for a random quantity or parameter vector based on received data summarized through a likelihood function for the parameter
Dynamic models Observation model Evolution model
Computations Evolution step –Compute the current prior for t. Updating step –Observing Y t, compute the current posterior
Computations: evolution step 1.Various features of the prior p ( t | D t-1 ) of interest can be computer directly using the Monte Carlo structure 2.The prior density function can be evaluated by Monte Carlo integration at any point
3.The initial Monte Carlo samples t * (by t from p ( t | t-1,i )) provide starting values for the evaluation of the prior. 4. t * may be used with weights t-1 to construct a generalized kernel density estimate of the prior 5.Monte Carlo computations can be performed to approximate forecast moments and probabilities
Computations: updating step Adaptive Monte Carlo density
Examples Example 1 –A normal, linear, first-order polynomial model Example 2 –Not normal –Using T distributions Example 3 –bifurcating
Examples Example 4 –Television advertising