SMEM Algorithm for Mixture Models

SMEM Algorithm for Mixture Models
N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hinton Neural Computation, Vol. 12, No. 9 pp , September 2000 Cho, Dong-Yeon

© 2001 SNU CSE Biointelligence Lab
Abstract SMEM Algorithm Split-and-merge expectation-maximization algorithm To overcome the local maxima problem in parameter estimation of finite mixture models. Simultaneous split-and merge operations using a new criterion for efficiently selecting the split-and-merge candidates Gaussian mixtures and mixture of factor analyzers Synthetic and real data Image compression and pattern recognition problems © 2001 SNU CSE Biointelligence Lab

Introduction Mixture density models Normal mixtures More sophisticated mixture density models Mixtures of latent variable models: probabilistic PCA and factor analysis Parameters can be estimated using the EM algorithm. Maximum likelihood framework Local maxima problem Deterministic Annealing EM (DAEM) algorithm A modified posterior probability parameterized by temperature Not very efficient at avoiding local maxima for mixture models © 2001 SNU CSE Biointelligence Lab

Idea of Performing Split-and-Merge Operations Discrete move that simultaneously merges two components in an overpopulated region and splits a component in an underpopulated region. Applications Clustering and vector quantization Bayesian normal mixture analysis: split-and-merge operations with a MCMC method The proposed method is limited to mixture models. © 2001 SNU CSE Biointelligence Lab

EM Algorithm Data Complete data Z = (X, Y) X: observed data (incomplete data) Y: unobserved data Joint Probability Density p(X,Y;) : parameters of the density to be estimated MLE of  Maximization of the incomplete data log-likelihood © 2001 SNU CSE Biointelligence Lab

Characteristic of the EM Algorithm Iteratively maximizing the expectation of the complete data log-likelihood function E-step M-step Find the  maximizing Q(|(t)) The convergence of the EM steps is theoretically guaranteed. © 2001 SNU CSE Biointelligence Lab

Split-and-Merge EM Algorithm
Split-and-Merge Operation The pdf of a mixture of M density models m: mixing proportion of the mth model (m  0) pm(x;m): d-dimensional density model corresponding to the mth model  = {(m, m), m = 1,…,M} © 2001 SNU CSE Biointelligence Lab

After the EM algorithm has converged Merging models i and j to produce a model i’ Splitting the model k into two models j’ and k’ Initialization Initial parameter for the merged model i’ For models j’ and k’ © 2001 SNU CSE Biointelligence Lab

Cmax = M(M-1)(M-2)/2 We have performed experimentally that Cmax5 may be enough because the split-and-merge criteria do work well. The SMEM algorithm monotonically increase the Q function value. If the Q function value does not increase for all c = 1,…, Cmax, then the algorithm stops. Total number of mixture components is unchanged. The global convergence properties of the EM algorithm Intuitively, a simultaneous split-and-merge can be viewed as a way pf tunneling through low-likelihood barriers, there by eliminating many poor local optima. © 2001 SNU CSE Biointelligence Lab

Split-and-Merge Criteria Merge Criterion When there are many data points, each of which has almost equal posterior probability for any two component, it can be thought that these two components might be merged. Two components with large merge criterion are good candidates for a merge. © 2001 SNU CSE Biointelligence Lab

Split Criterion Local Kullback divergence The distance between two distributions: the local density around the kth model and the density of the kth model specified by the current parameter estimate. When the weights are equal, that is P(k|x;*) = 1/M The split criterion can be viewed as a likelihood ratio test. © 2001 SNU CSE Biointelligence Lab

Sorting Candidates The merge candidates are sorted based on merge criterion. For each sorted merge candidate {i, j}c, the split candidates, excluding {i, j}c, are sorted as {k}c. By combining these results and renumbering them, we obtain {i, j,k}c, c=1,…, M(M-1)(M-2)/2 © 2001 SNU CSE Biointelligence Lab

Application to Density Estimation by Mixture of Gaussians
Synthetic Data Mixture of Gaussians Mean vector and covariance matrix The split-and-merge operations not only appropriately assign the number of Gaussians in a local data space, but can improve the Gaussian parameters themselves. © 2001 SNU CSE Biointelligence Lab

Real Data Facial images processed into feature vector 20 dimension Data size: 103 for training, 103 for test 10 different initializations using k-means clustering algorithm M = 5 and a diagonal covariance for each Gaussian Log-Likelihood/Sample Size © 2001 SNU CSE Biointelligence Lab

The Number of EM-Steps The number includes not only partial and full EM steps for accepted operations, but also EM-steps for rejected ones. 8.7 times slower than the original EM algorithm The average rank of the accepted split-and-merge candidates was 1.8 (std = 0.9), which indicates that the proposed split-and-merge criteria worked very well. © 2001 SNU CSE Biointelligence Lab

Application to Dimensionality Reduction Using Mixture of Factor Analyzers Factor Analyzers A single factor analyzer (FA) An observed p-dimensional variable x is generated as a linear transformation of some lower q-dimensional latent variable z ~ N(0, I) plus additive Gaussian noise v ~ (0, ), where  is a diagonal matrix. Factor loading matrix: W  pq Mean vector:  The pdf of the observed data by an FA model © 2001 SNU CSE Biointelligence Lab

Mixture of Factor Analyzers M mixture of FAs The MFA model can perform clustering and dimensionality reduction simultaneously. Complete data log-likelihood SMEM algorithm is straightforwardly applicable to the parameter estimation of the MFA model. © 2001 SNU CSE Biointelligence Lab

Application to Pattern Recognition We can compute the posterior probability for each data point since once an MFA model is fitted to each class. Optimal class i* for x © 2001 SNU CSE Biointelligence Lab

Digit recognition task (10 classes) 16 dimensional data (Glucksman’s features) Data size: 200 per class for training and 200 per class for test 3NN: 88.3 % SS (CLAFIC) © 2001 SNU CSE Biointelligence Lab

Conclusion Simultaneous Split-and-Merge Operations A way of tunneling through low-likelihood barriers, thereby eliminating many non-global optima SMEM algorithm outperforms the standard EM algorithm, and therefore it can be very useful in practice. Wide Variety of Mixture Models Future Work By introducing probability measures over model, we can also use the split-and-merge operations to determine the appropriate number of components within the Bayesian framework. © 2001 SNU CSE Biointelligence Lab

SMEM Algorithm for Mixture Models

Similar presentations

Presentation on theme: "SMEM Algorithm for Mixture Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SMEM Algorithm for Mixture Models

Similar presentations

Presentation on theme: "SMEM Algorithm for Mixture Models"— Presentation transcript:

Similar presentations

About project

Feedback