Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Similar presentations


Presentation on theme: "CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014."— Presentation transcript:

1 CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014

2 2 k-Means Clustering Problem: Find k reference vectors M={m 1,…, m k } (representatives/centroids) and a 1-of-k coding scheme (X  M) represented by b_ti that minimizes the “reconstruction error”: b_it and M has to be found that minimizes E: Step1: find the best b_it: Step2: find M: m i =(  b_it*x t )/(  b_it) Problem: b_it depends on M!!  only an iterative solution can be found!! Significantly different from book!! Eick: K-Means and EM

3 Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 k-means Clustering

4 4 k-Means Clustering Reformulated Given dataset X, k (and a random seed) Problem: Find k vectors M={m 1,…, m k }  d and 1-of-k coding scheme b i t (b i t : X  M) that minimizes the following error: Subject to:   {0,1}   (t)=  i =1 for t=1,…,N (“coding scheme”) Eick: K-Means and EM

5 5 k-Means Clustering Generalized Given dataset X, k, distance function d Problem: Find k vectors M={m 1,…, m k }  d and 1-of-k coding scheme b i t (b i t : X  M) that minimizes the following error: Subject to:   {0,1}   (t)=  i =1 for t=1,…,N (“coding scheme”) Eick: K-Means and EM

6 6 K-mean’s Complexity—O(kntd) K-means performs t iterations of E/M-steps; in each iteration the following computations have to be done:  E-Step: 1. Find the nearest centroid for each object; because there are n objects and k centroids n*k distances have to be computed and compared; the complexity of computing a distance is O(d) yielding: O(k*n*d) 2. Assign objects to clusters/update clusters (O(n))  M-Step: Compute centroid for each cluster (O(n)) We obtain: O(t*(k*n*d + n + n))=O(t*k*n*d) Eick: K-Means and EM

7 Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Semiparametric Density Estimation Parametric: Assume a single model for p (x | C i ) (Chapter 4 and 5) Semiparametric: p (x | C i ) is a mixture of densities Multiple possible explanations/prototypes: Nonparametric: No model; data speaks for itself (Chapter 8)

8 Eick: K-Means and EM 8 Mixture Densities where G i the components/groups/clusters, P ( G i ) mixture proportions (priors), p ( x | G i ) component densities Gaussian mixture where p(x|G i ) ~ N ( μ i, ∑ i ) parameters Φ = {P ( G i ), μ i, ∑ i } k i=1 unlabeled sample X={x t } t (unsupervised learning) Idea: use a density function for each cluster or use multiple density functions for a single class

9 9 Expectation-Maximization (EM) Clustering Assumptions: datasets X are created a mixture of k multivariate Gaussian G 1,…G k also preserving their priors P(G 1 ),…,P(G k ). Example (2-D Dataset): p(x|G 1 )~N((2,3),  1 ), P(G 1 )=0.5 p(x|G 2 )~ N((-4,1),  2 ), P(G 2 )=0.4 p(x|G 3 )~N((0,-9),  3 ), P(G 3 )=0.1 EM’s Task: Given X, reconstruct the k Gaussian and their priors; each Gaussian is a model of a cluster in the dataset. Forming of clusters: Assign x i to the cluster G r which has the maximum value for: p(x i |G r )  P(G r ); alternatively, set p(x i |G r )  P(G r )= (soft EM) Eick: K-Means and EM

10 10 Performance Measure of EM EM maximizes Log likelihood of a mixture model We choose  which makes X most likely. Assume hidden variables z, which when known, make optimization much simpler. z gives the probability of an object belonging to a particular cluster; z_ij is not known and has to be estimated; h_ij denotes its estimator. EM maximizes the log likelihood of the sample, but instead of maximizing L( Φ |X), it maximizes L( Φ |X,Z), in terms of x and z. Eick: K-Means and EM

11 Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 E- and M-steps Iterate the two steps 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ ’ given z, X, and old Φ. An increase in Q increases incomplete likelihood Assignments of samples to clusters Current Model Remark: z_it denotes the unknown membership of the t-th example

12 12 EM in Gaussian Mixtures z t i = p(x|G i ); assume p(x|G i )~N( μ i,∑ i ) E-step: Estimate mixture probabilities for X={x 1,…,x n } M-step: Obtain:  l+1 namely (P(G i ),m i,S i ) l+1 for i=1,…,k Use estimated labels in place of unknown labels Remarks: h_it plays the role of b_it in K-Means h_it acts as an estimator for the unknown labels z_ij.

13 13 EM Means Expectation-Maximization 1. E-step (Expectation): Determine object membership in clusters (from model) 2. M-step (Maximization): Create model by using Maximum Likelihood Estimation (from cluster memberships) Computations in the two steps (K-means/EM): 1. Compute b_it / h_it 2. Compute centroid/(prior, mean,co-variance matrix) Remark: K-means uses a simple form of the EM-procedure! Eick: K-Means and EM

14 14 Commonalities between K-Means and EM 1. They start with random clusters and rely on a 2 step- approach to minimize the objective function using the EM-procedure (see previous transparency). 2. Use the same optimization procedure of an objective function f(a 1,…,a m,b 1,…,,b k ); we basically, maximize the a-values (keeping the b-values fixed) and then the b- values (keeping the a-values fixed) until some convergence is reached. Consequently, both algorithms  only find a local minimum of the objective function  are sensitive to initialization 3. Both assume the number of clusters k is known Eick: K-Means and EM

15 15 Differences between K-Means and EM I 1. K-means is distanced-based and relies on 1-NN queries to form clusters. EM is density based/probabilistic; EM usually works with multivariate Gaussians but can be generalized to work with other probability distributions. 2. K-means minimizes the squared distance of on object to its cluster prototype (usually the centroid). EM maximizes the log-likelihood of a sample given a model (p(X|  )); models are assumed to be mixtures of k Gaussians and their priors. 3. K-means is a hard clustering, EM is a soft clustering algorithm: h_it  [0,1] Eick: K-Means and EM

16 16 Differences between K-Means and EM II 4. K-means cluster models are just k centroids; EM models are k “priors, means+co-variance matrices”. 5. EM directly deals with dependencies between attributes in its density estimation approach: the degree to which an object x belongs to a cluster c depends on the product of c’s prior with the Mahalanobis distance between x and the c’s mean; therefore, EM clusters do not depend on units of measurements and orientation of attributes in space. 6. The distance metrics can be viewed as an input parameter when using k-means, and generalizations of K-means have been proposed which use different distance functions. EM implicitly relies on the Mahalanobis distance function which is part of its density estimation approach. Eick: K-Means and EM

17 Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Mixture of Mixtures In classification, the input comes from a mixture of classes (supervised). If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:

18 Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 Choosing k Defined by the application, e.g., image quantization Plot data (after PCA) and check for clusters Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances) Manual check for meaning Run with multiple k-values and compare the results


Download ppt "CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014."

Similar presentations


Ads by Google