Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp. 325-333, 1993.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
Dynamic Spatial Mixture Modelling and its Application in Cell Tracking - Work in Progress - Chunlin Ji & Mike West Department of Statistical Sciences,
Introduction to Monte Carlo Markov chain (MCMC) methods
Computer vision: models, learning and inference Chapter 8 Regression.
Bayesian Estimation in MARK
Supervised Learning Recap
Lecture 3 Nonparametric density estimation and classification
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Particle Filters Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics TexPoint fonts used in EMF. Read.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Clustering.
Particle Filters for Mobile Robot Localization 11/24/2006 Aliakbar Gorji Roborics Instructor: Dr. Shiri Amirkabir University of Technology.
Computer vision: models, learning and inference Chapter 6 Learning and Inference in Vision.
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Probabilistic Robotics Bayes Filter Implementations.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Computer Science, Software Engineering & Robotics Workshop, FGCU, April 27-28, 2012 Fault Prediction with Particle Filters by David Hatfield mentors: Dr.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Mobile Robot Localization (ch. 7)
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
CS Statistical Machine learning Lecture 24
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Sequential Monte-Carlo Method -Introduction, implementation and application Fan, Xin
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Flat clustering approaches
Multilevel and multifrailty models. Overview  Multifrailty versus multilevel Only one cluster, two frailties in cluster e.g., prognostic index (PI) analysis,
Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Bayesian Approach Jake Blanchard Fall Introduction This is a methodology for combining observed data with expert judgment Treats all parameters.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Inference for the mean vector
Latent Variables, Mixture Models and EM
Auxiliary particle filtering: recent developments
Predictive distributions
More about Posterior Distributions
Multidimensional Integration Part I
Filtering and State Estimation: Basic Concepts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Biointelligence Laboratory, Seoul National University
More Parameter Learning, Multinomial and Continuous Variables
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Mathematical Foundations of BME
Multivariate Methods Berlin Chen, 2005 References:
CS639: Data Management for Data Science
Uncertainty Propagation
Presentation transcript:

Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993

Abstract The development of discrete mixture distributions as approximations to priors and posteriors in Bayesian analysis –Adaptive density estimation

Adaptive mixture modeling p (  ) : the continuous posterior density function for a continuous parameter vector . g (  ) : approximating density for importance sampling function. –T-distribution  = {  j, j =1,…, n } : random sample from g (  ).  = { w j, j =1,…, n } : weights –w j = p (  )/( k g (  )) –k =

Importance sampling and mixture Univariate random sampling –Direct Bayesian interpretations (based on mixtures of Dirichlet processes) Multivariate kernel estimation –Weighted kernel estimator

Adaptive methods of posterior approximation Possible patterns of local dependence exhibited by p (  ) –Easy Different regions of parameter space are associated with rather different patterns of dependence. – V is varying with local j and more heavily depending on  j.

Adaptive importance sampling The importance sampling distribution is sequently revised based on information derived from successive Monte Carlo samples.

AIS algorithm 1.Choose an initial importance sampling distribution with density g 0 (  ), draw a small sample n 0 and compute weights, deducing the summary  0 = { g 0, n 0,  0,  0 }. Compute the Monte Carlo estimates and V 0 of the mean and variance of p 0 2.Construct a revised importance function g 1 (  ) using (1) with sample size n 0, points  0,j, weights w 0,j, and variance matrix V 0 3.Draw a larger sample of size n 1 from g 1 (  ), and replace  0 with  1 4.Either stop, and base inferences on  1, or proceed, if desired, to a further revised version g 2 (  ), constructed similarly.

Approximating mixtures by mixtures The computational burden increases if further refinement with larger sample sizes. –Solution) Using a mixtures of several thousand T Reducing the number of components by replacing ‘nearest neighboring’ components with some form of average

Clustering routine 1.Set r = n, starting with the r = n component mixture, choose k < n as the number of components for the final, reduced mixture. 2.Sort r values of  j. in  in order of increasing values of weights w j in  3.Find the index i such that  j. is the nearest neighbor of  1, and reduce the sets  and  to sets of size r –1 by removing components 1 and i, and inserting ‘average’ values

4.Proceed to (2), stopping here only when r = k 5.The resulting mixture, the locations based on the final k averaged values, with associated combined weights, the same scale matrix V but new, and larger, window-width h based on the current, reduced ‘sample size’ r rather than n

Sequential updating and dynamic models Updating a prior to posterior distribution for a random quantity or parameter vector based on received data summarized through a likelihood function for the parameter

Dynamic models Observation model Evolution model

Computations Evolution step –Compute the current prior for  t. Updating step –Observing Y t, compute the current posterior

Computations: evolution step 1.Various features of the prior p (  t | D t-1 ) of interest can be computer directly using the Monte Carlo structure 2.The prior density function can be evaluated by Monte Carlo integration at any point

3.The initial Monte Carlo samples  t * (by  t from p (  t |  t-1,i )) provide starting values for the evaluation of the prior. 4.  t * may be used with weights  t-1 to construct a generalized kernel density estimate of the prior 5.Monte Carlo computations can be performed to approximate forecast moments and probabilities

Computations: updating step Adaptive Monte Carlo density

Examples Example 1 –A normal, linear, first-order polynomial model Example 2 –Not normal –Using T distributions Example 3 –bifurcating

Examples Example 4 –Television advertising