Presentation is loading. Please wait.

Presentation is loading. Please wait.

SMEM Algorithm for Mixture Models

Similar presentations


Presentation on theme: "SMEM Algorithm for Mixture Models"— Presentation transcript:

1 SMEM Algorithm for Mixture Models
N. Ueda, R. Nakano, Z. Ghahramani, and G. E. Hinton Neural Computation, Vol. 12, No. 9 pp , September 2000 Cho, Dong-Yeon

2 © 2001 SNU CSE Biointelligence Lab
Abstract SMEM Algorithm Split-and-merge expectation-maximization algorithm To overcome the local maxima problem in parameter estimation of finite mixture models. Simultaneous split-and merge operations using a new criterion for efficiently selecting the split-and-merge candidates Gaussian mixtures and mixture of factor analyzers Synthetic and real data Image compression and pattern recognition problems © 2001 SNU CSE Biointelligence Lab

3 © 2001 SNU CSE Biointelligence Lab
Introduction Mixture density models Normal mixtures More sophisticated mixture density models Mixtures of latent variable models: probabilistic PCA and factor analysis Parameters can be estimated using the EM algorithm. Maximum likelihood framework Local maxima problem Deterministic Annealing EM (DAEM) algorithm A modified posterior probability parameterized by temperature Not very efficient at avoiding local maxima for mixture models © 2001 SNU CSE Biointelligence Lab

4 © 2001 SNU CSE Biointelligence Lab
Idea of Performing Split-and-Merge Operations Discrete move that simultaneously merges two components in an overpopulated region and splits a component in an underpopulated region. Applications Clustering and vector quantization Bayesian normal mixture analysis: split-and-merge operations with a MCMC method The proposed method is limited to mixture models. © 2001 SNU CSE Biointelligence Lab

5 © 2001 SNU CSE Biointelligence Lab
EM Algorithm Data Complete data Z = (X, Y) X: observed data (incomplete data) Y: unobserved data Joint Probability Density p(X,Y;) : parameters of the density to be estimated MLE of  Maximization of the incomplete data log-likelihood © 2001 SNU CSE Biointelligence Lab

6 © 2001 SNU CSE Biointelligence Lab
Characteristic of the EM Algorithm Iteratively maximizing the expectation of the complete data log-likelihood function E-step M-step Find the  maximizing Q(|(t)) The convergence of the EM steps is theoretically guaranteed. © 2001 SNU CSE Biointelligence Lab

7 Split-and-Merge EM Algorithm
Split-and-Merge Operation The pdf of a mixture of M density models m: mixing proportion of the mth model (m  0) pm(x;m): d-dimensional density model corresponding to the mth model  = {(m, m), m = 1,…,M} © 2001 SNU CSE Biointelligence Lab

8 © 2001 SNU CSE Biointelligence Lab
After the EM algorithm has converged Merging models i and j to produce a model i’ Splitting the model k into two models j’ and k’ Initialization Initial parameter for the merged model i’ For models j’ and k’ © 2001 SNU CSE Biointelligence Lab

9 © 2001 SNU CSE Biointelligence Lab
An example of initialization in a two-dimensional gaussian case © 2001 SNU CSE Biointelligence Lab

10 © 2001 SNU CSE Biointelligence Lab
Partial EM Steps Modified posterior probability We can reestimate the parameters for model i’, j’, and k’ consistently without affecting the other models. © 2001 SNU CSE Biointelligence Lab

11 © 2001 SNU CSE Biointelligence Lab
SMEM Algorithm © 2001 SNU CSE Biointelligence Lab

12 © 2001 SNU CSE Biointelligence Lab
Cmax = M(M-1)(M-2)/2 We have performed experimentally that Cmax5 may be enough because the split-and-merge criteria do work well. The SMEM algorithm monotonically increase the Q function value. If the Q function value does not increase for all c = 1,…, Cmax, then the algorithm stops. Total number of mixture components is unchanged. The global convergence properties of the EM algorithm Intuitively, a simultaneous split-and-merge can be viewed as a way pf tunneling through low-likelihood barriers, there by eliminating many poor local optima. © 2001 SNU CSE Biointelligence Lab

13 © 2001 SNU CSE Biointelligence Lab
Split-and-Merge Criteria Merge Criterion When there are many data points, each of which has almost equal posterior probability for any two component, it can be thought that these two components might be merged. Two components with large merge criterion are good candidates for a merge. © 2001 SNU CSE Biointelligence Lab

14 © 2001 SNU CSE Biointelligence Lab
Split Criterion Local Kullback divergence The distance between two distributions: the local density around the kth model and the density of the kth model specified by the current parameter estimate. When the weights are equal, that is P(k|x;*) = 1/M The split criterion can be viewed as a likelihood ratio test. © 2001 SNU CSE Biointelligence Lab

15 © 2001 SNU CSE Biointelligence Lab
Sorting Candidates The merge candidates are sorted based on merge criterion. For each sorted merge candidate {i, j}c, the split candidates, excluding {i, j}c, are sorted as {k}c. By combining these results and renumbering them, we obtain {i, j,k}c, c=1,…, M(M-1)(M-2)/2 © 2001 SNU CSE Biointelligence Lab

16 Application to Density Estimation by Mixture of Gaussians
Synthetic Data Mixture of Gaussians Mean vector and covariance matrix The split-and-merge operations not only appropriately assign the number of Gaussians in a local data space, but can improve the Gaussian parameters themselves. © 2001 SNU CSE Biointelligence Lab

17 © 2001 SNU CSE Biointelligence Lab

18 © 2001 SNU CSE Biointelligence Lab
Real Data Facial images processed into feature vector 20 dimension Data size: 103 for training, 103 for test 10 different initializations using k-means clustering algorithm M = 5 and a diagonal covariance for each Gaussian Log-Likelihood/Sample Size © 2001 SNU CSE Biointelligence Lab

19 © 2001 SNU CSE Biointelligence Lab
Trajectories of log-likelihood The successive split-and-merge operations improved the log-likelihood for both the training and test data. © 2001 SNU CSE Biointelligence Lab

20 © 2001 SNU CSE Biointelligence Lab
The Number of EM-Steps The number includes not only partial and full EM steps for accepted operations, but also EM-steps for rejected ones. 8.7 times slower than the original EM algorithm The average rank of the accepted split-and-merge candidates was 1.8 (std = 0.9), which indicates that the proposed split-and-merge criteria worked very well. © 2001 SNU CSE Biointelligence Lab

21 © 2001 SNU CSE Biointelligence Lab
Application to Dimensionality Reduction Using Mixture of Factor Analyzers Factor Analyzers A single factor analyzer (FA) An observed p-dimensional variable x is generated as a linear transformation of some lower q-dimensional latent variable z ~ N(0, I) plus additive Gaussian noise v ~ (0, ), where  is a diagonal matrix. Factor loading matrix: W  pq Mean vector:  The pdf of the observed data by an FA model © 2001 SNU CSE Biointelligence Lab

22 © 2001 SNU CSE Biointelligence Lab
Mixture of Factor Analyzers M mixture of FAs The MFA model can perform clustering and dimensionality reduction simultaneously. Complete data log-likelihood SMEM algorithm is straightforwardly applicable to the parameter estimation of the MFA model. © 2001 SNU CSE Biointelligence Lab

23 © 2001 SNU CSE Biointelligence Lab
Demonstration © 2001 SNU CSE Biointelligence Lab

24 © 2001 SNU CSE Biointelligence Lab
Practical Applications Image Compression An MFA model is available for block transform image coding. © 2001 SNU CSE Biointelligence Lab

25 © 2001 SNU CSE Biointelligence Lab
15.8103 10.1103 7.3103 © 2001 SNU CSE Biointelligence Lab

26 © 2001 SNU CSE Biointelligence Lab
Application to Pattern Recognition We can compute the posterior probability for each data point since once an MFA model is fitted to each class. Optimal class i* for x © 2001 SNU CSE Biointelligence Lab

27 © 2001 SNU CSE Biointelligence Lab
Digit recognition task (10 classes) 16 dimensional data (Glucksman’s features) Data size: 200 per class for training and 200 per class for test 3NN: 88.3 % SS (CLAFIC) © 2001 SNU CSE Biointelligence Lab

28 © 2001 SNU CSE Biointelligence Lab
Conclusion Simultaneous Split-and-Merge Operations A way of tunneling through low-likelihood barriers, thereby eliminating many non-global optima SMEM algorithm outperforms the standard EM algorithm, and therefore it can be very useful in practice. Wide Variety of Mixture Models Future Work By introducing probability measures over model, we can also use the split-and-merge operations to determine the appropriate number of components within the Bayesian framework. © 2001 SNU CSE Biointelligence Lab


Download ppt "SMEM Algorithm for Mixture Models"

Similar presentations


Ads by Google