Collapsed Variational Dirichlet Process Mixture Models

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Topic models Source: Topic models, David Blei, MLSS 09.
Teg Grenager NLP Group Lunch February 24, 2005
Xiaolong Wang and Daniel Khashabi
Markov Chain Sampling Methods for Dirichlet Process Mixture Models R.M. Neal Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Bayesian Nonparametrics via Probabilistic Programming Frank Wood
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Dynamic Bayesian Networks (DBNs)
Decoupling Sparsity and Smoothness in the Discrete Hierarchical Dirichlet Process Chong Wang and David M. Blei NIPS 2009 Discussion led by Chunping Wang.
Variational Inference for Dirichlet Process Mixture Daniel Klein and Soravit Beer Changpinyo October 11, 2011 Applied Bayesian Nonparametrics Special Topics.
Visual Recognition Tutorial
Particle Filtering for Non- Linear/Non-Gaussian System Bohyung Han
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
A Comparative Analysis of Bayesian Nonparametric Variational Inference Algorithms for Speech Recognition John Steinberg Institute for Signal and Information.
1 Naïve Bayes Models for Probability Estimation Daniel Lowd University of Washington (Joint work with Pedro Domingos)
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Hierarchical Dirichelet Processes Y. W. Tech, M. I. Jordan, M. J. Beal & D. M. Blei NIPS 2004 Presented by Yuting Qi ECE Dept., Duke Univ. 08/26/05 Sharing.
Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.
High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture model Based on Minimum Message Length by Nizar Bouguila.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Randomized Algorithms for Bayesian Hierarchical Clustering
-Arnaud Doucet, Nando de Freitas et al, UAI
Variational Inference for the Indian Buffet Process
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Introduction to LDA Jinyang Gao. Outline Bayesian Analysis Dirichlet Distribution Evolution of Topic Model Gibbs Sampling Intuition Analysis of Parameter.
Stick-Breaking Constructions
CS Statistical Machine learning Lecture 24
Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.
Lecture 2: Statistical learning primer for biologists
Latent Dirichlet Allocation
Characterizing the Function Space for Bayesian Kernel Models Natesh S. Pillai, Qiang Wu, Feng Liang Sayan Mukherjee and Robert L. Wolpert JMLR 2007 Presented.
Beam Sampling for the Infinite Hidden Markov Model by Jurgen Van Gael, Yunus Saatic, Yee Whye Teh and Zoubin Ghahramani (ICML 2008) Presented by Lihan.
Learning Kernel Classifiers Chap. 3.3 Relevance Vector Machine Chap. 3.4 Bayes Point Machines Summarized by Sang Kyun Lee 13 th May, 2005.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Analysis of Social Media MLD , LTI William Cohen
Univariate Gaussian Case (Cont.)
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
NIPS 2013 Michael C. Hughes and Erik B. Sudderth
Hierarchical Beta Process and the Indian Buffet Process by R. Thibaux and M. I. Jordan Discussion led by Qi An.
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
A Method to Approximate the Bayesian Posterior Distribution in Singular Learning Machines Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Fast search for Dirichlet process mixture models
Bayesian Semi-Parametric Multiple Shrinkage
Bayesian Generalized Product Partition Model
Variational Bayes Model Selection for Mixture Distribution
Non-Parametric Models
Omiros Papaspiliopoulos and Gareth O. Roberts
A Non-Parametric Bayesian Method for Inferring Hidden Causes
Kernel Stick-Breaking Process
Exact and Approximate Sum Representations for the Dirichlet Process
Multitask Learning Using Dirichlet Process
Generalized Spatial Dirichlet Process Models
Chinese Restaurant Representation Stick-Breaking Construction
Bayesian Inference for Mixture Language Models
Stochastic Optimization Maximization for Latent Variable Models
Bayesian Nonparametric Matrix Factorization for Recorded Music
Topic models for corpora and for graphs
≠ Particle-based Variational Inference for Continuous Systems
Topic Models in Text Processing
Lecture 11 Generalizations of EM.
Mathematical Foundations of BME
Introduction to Machine Learning
Presentation transcript:

Collapsed Variational Dirichlet Process Mixture Models Kenichi Kurihara, Max Welling and Yee W. Teh Published on IJCAI 07 Discussion led by Qi An

Outline Introduction Four approximations to DP Variational Bayesian Inference Optimal cluster label reordering Experimental results Discussion

Introduction DP is suitable for many density estimation and data clustering applications. Gibbs sampling solution is not efficient enough to scale up to the large scale problems. Truncated stick-breaking approximation is formulated in the space of explicit, non-exchangeable cluster labels.

Introduction This paper propose an improved VB algorithm based on integrating out mixture weights compare the stick-breaking representation against the finite symmetric Dirichlet approximation maintain optimal ordering of cluster labels in the stick-breaking VB algorithm

Approximations to DP Truncated stick-breaking representation The joint distribution can be expressed as:

Approximations to DP Finite symmetric Dirichlet approximation The joint distribution can be expressed as: The essential difference from TSB representation is that the cluster labels remain interchangeable under this formulation.

Dirichlet process is most naturally defined on a partition space while both TSB and FSD are defined over the cluster label space. Moreover, TSB and FSD also live in different spaces

Marginalization In variational Bayesian approximation, we assume a factorized form for the posterior distribution. However it is not a good assumption since changes in π will have a considerable impact on z. If we can integrate out π , the joint distribution is given by For the TSB representation: For the FSD representation: α

VB inference We can then apply the VB inference on the four approximations The lower bound is given by The approximated posterior distribution for TSB and FSD are Depending on marginalization or not, v and π may be integrated out.

Gaussian approximation For collapsed approximations, the computation for q(zij) seems intractable due to the exponentially large space of assignments for all other {zij}. With central limit theory and Taylor expansion, the expectation over zij will be approximated with those expectations

Optimal cluster label reordering For FSB representation, the prior assumes a certain ordering of the clusters. The authors claims the optimal relabelling of the clusters is given by ordering the cluster sizes in decreasing order.

Experimental results

Experimental results

Discussion There is very little difference between variational Bayesian inference in the reordered stick-breaking representation and the finite mixture model with symmetric Dirichlet priors. Label reordering is important for the stick-breaking representation Variational approximation are much more efficient computationally than Gibbs sampling, with almost no loss in accuracy