Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim 12.03.15.(Thu) Computational Models of Intelligence.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

MCMC estimation in MlwiN
Teg Grenager NLP Group Lunch February 24, 2005
Gentle Introduction to Infinite Gaussian Mixture Modeling
Xiaolong Wang and Daniel Khashabi
Course: Neural Networks, Instructor: Professor L.Behera.
A Tutorial on Learning with Bayesian Networks
Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Ouyang Ruofei Topic Model Latent Dirichlet Allocation Ouyang Ruofei May LDA.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Hierarchical Dirichlet Processes
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Beam Sampling for the Infinite Hidden Markov Model Van Gael, et al. ICML 2008 Presented by Daniel Johnson.
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayes Factor Based on Han and Carlin (2001, JASA).
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Correlated Topic Models By Blei and Lafferty (NIPS 2005) Presented by Chunping Wang ECE, Duke University August 4 th, 2006.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Introduction to Bayesian statistics Yves Moreau. Overview The Cox-Jaynes axioms Bayes’ rule Probabilistic models Maximum likelihood Maximum a posteriori.
Kalman Filter (Thu) Joon Shik Kim Computational Models of Intelligence.
Hierarchical Topic Models and the Nested Chinese Restaurant Process Blei, Griffiths, Jordan, Tenenbaum presented by Rodrigo de Salvo Braz.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
Probability and Measure September 2, Nonparametric Bayesian Fundamental Problem: Estimating Distribution from a collection of Data E. ( X a distribution-valued.
Latent Dirichlet Allocation D. Blei, A. Ng, and M. Jordan. Journal of Machine Learning Research, 3: , January Jonathan Huang
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
Stick-Breaking Constructions
CS Statistical Machine learning Lecture 24
1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.
MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Fast search for Dirichlet process mixture models
Variational Bayes Model Selection for Mixture Distribution
Non-Parametric Models
Linear and generalized linear mixed effects models
Omiros Papaspiliopoulos and Gareth O. Roberts
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Kernel Stick-Breaking Process
Sampling Distribution
Sampling Distribution
Hierarchical Topic Models and the Nested Chinese Restaurant Process
Hidden Markov Model LR Rabiner
Ch13 Empirical Methods.
Bayesian Inference for Mixture Language Models
An introduction to Graphical Models – Michael Jordan
LECTURE 07: BAYESIAN ESTIMATION
Topic Models in Text Processing
Rational models of categorization
Estimation – Posterior intervals
Presentation transcript:

Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim 12.03.15.(Thu) Computational Models of Intelligence

Abstract This article reviews Markov chain methods for sampling from the posterior distribution of a Dirichlet process mixture model and presents two new classes of methods. One new approach is to make Metropolis-Hastings updates of the indicators specifying which mixture component is associated with each observation, perhaps supplemented with a partial form of Gibbs sampling.

Chinese Restaurant Process (1/2) CRP is a distribution on partitions that captures the clustering effect of the DP

Chinese Restaurant Process (2/2)

Introduction (1/2) Modeling a distribution as a mixture of simpler distribution is useful both as a nonparametric density estimation method and as a way of identifying latent classes that can explain the dependencies observed between variables. Use of Dirichlet process mixture models has become computationally feasible with the development of Markov chain methods for sampling from the posterior distribution of the parameters of the component distribution and/or of the associations of mixture components with observations.

Introduction (2/2) In this article, I present two new approaches to Markov chain sampling. A very simple method for handling non-conjugate priors is to use Metropolis-Hastings updates with the conditional prior as the proposal distribution. A variation of this method may sometimes sample more efficiently, particularly when combined with a partial form of Gibbs sampling.

Dirichlet Process Mixture Models (1/5) The basic model applies to data y1,…,yn which we regard as part of an indefinite exchangeable sequence, or equivalent, as being independently drawn from some unknown distribution.

Dirichlet Process Mixture Models (2/5) We model the distribution from which the yi are drawn as a mixture of distributions of the form F(θ), with the mixing distribution over θ given G. We let the prior for this mixing distribution be a Dirichlet process, with concentration parameter α and base distribution G0.

Dirichlet Process Mixture Models (3/5)

Dirichlet Process Mixture Models (4/5)

Dirichlet Process Mixture Models (5/5) If we let K go to infinity, the conditional probabilities reach the following limits:

Gibbs Sampling when Conjugate Priors are used (3/4)

Nested CRP Used for modeling topic hierarchies by Blei et. al., 2004. Day 1 Day 2 Day 3