Presentation on theme: "1 K Means Clustering, Nearest Cluster and Gaussian Mixture Presented by Kuei-Hsien 2005.06.23."— Presentation transcript:
1 K Means Clustering, Nearest Cluster and Gaussian Mixture Presented by Kuei-Hsien
2 Clustering algorithms are used to find groups of “similar” data points among the input patterns. K means clustering is an effective algorithm to extract a given number of clusters of patterns from a training set. Once done, the cluster locations can be used to classify data into distinct classes. K Means Clustering
3 Initialize the number of cluster centers selected by the user by randomly selecting them from the training set. Classify the entire training set. For each pattern X i in the training set, find the nearest cluster center C* and classify X i as a member of C*. For each cluster, recompute its center by finding the mean of the cluster: Where M k is the new mean, N k is the number of training patterns in cluster k, and X jk is the k-th pattern belonging to cluster. K Means Training Flow Chart Loop until the change in cluster means is less the amount specified by the user.
4 If the number of cluster centers is less then the number specified, split each cluster center into two clusters by finding the input dimension with the highest deviation: Where X ij is the i-th dimension of the j-th pattern in cluster k, M ij is the i- th dimension of the cluster center, and N k is the number of training patterns in cluster k. Store the k cluster centers.
5 1.A sk User how many clusters they’d like. (e.g. k=5). 2.R andom select k cluster centers locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) 4.Each cluster fine the centroid of the points it owns. 6.…repeat step 3 to 5 until terminated 5.…and jumps to there
6 SD is large and SD change is less than a specified value The means of clusters are too close Stop Condition-Splitting & Merging Splitting Merging Loop until the change of SD in all clusters are less than a specified value by user, or when a specified number of epochs have been reached.
7 K Means Test Flow Chart For each pattern X, associate X with the cluster Y closest to X using the Euclidean distance: Euclidean distance: Loop until the change in all cluster means is less the amount specified by the user Stop Condition- training of parameter
8 Commonly tunable parameters for K means Number of initial clusters Randomly chosen Number of cluster centers 2~√N 2~√N Criteria for splitting and merging cluster centers Number of epochs or percent of SD change Stop conditions The stop condition Binary splitting is less important than that for the overall clustering. K Means Clustering End
9 Nearest Cluster The nearest-cluster architecture can be viewed as a condensed version of the K nearest neighbor architecture. This architecture can often deliver performance close to that of KNN, while reducing computation time and memory requirements.
10 Nearest Cluster The nearest-cluster architecture involves a partitioning of the training set into a few clusters. For a given cluster, these values estimate the posterior probabilities of all possible classes for the region of the input space in the vicinity of the cluster. During classification, an input is associated with the nearest cluster, and the posterior probability estimates for that cluster are used to classify the input.
11 Perform K means clustering on the data set. For each cluster, generate a probability for each class according to: Where P jk is the probability for class j within cluster k, N jk is the number of class-j patterns belonging to cluster k, and N k is the total number of patterns belonging to cluster k. Nearest Cluster Training Flow Chart
12 Nearest Cluster Test Flow Chart For each input pattern, X, find the nearest cluster C k, using the Euclidean distance measure: Where Y is a cluster center and m is the number of dimensions in the input patterns. Use the probabilities P jk for all classes j stored with C k., and classify pattern X into the class j with the highest probability.
13 Class 1 Class 2 In cluster 1: P class1 =100% P class2 =0% In cluster 2: P class1 =99.5% P class2 =0.5% In cluster 3: P class1 =50% P class2 =50% In cluster 4: P class1 =15% P class2 =85% In cluster 5: P class1 =65% P class2 =35% P class1 =65% Nearest Cluster Nearest Cluster End h
14 Gaussian Mixture The Gaussian mixture architecture estimates probability density functions (PDF) for each class, and then performs classification based on Bayes’ rule: Where P(X | C i ) is the PDF of class j, evaluated at X, P( C j ) is the prior probability for class j, and P(X) is the overall PDF, evaluated at X.
15 Gaussian Mixture Unlike the unimodal Gaussian architecture, which assumes P(X | C j ) to be in the form of a Gaussian, the Gaussian mixture model estimates P(X | C j ) as a weighted average of multiple Gaussians. Where w k is the weight of the k-th Gaussian G k and the weights sum to one. One such PDF model is produced for each class.
16 Gaussian Mixture Each Gaussian component is defined as: Where M k is the mean of the Gaussian and V k is the covariance matrix of the Gaussian..
17 Gaussian Mixture Free parameters of the Gaussian mixture model consist of the means and covariance matrices of the Gaussian components and the weights indicating the contribution of each Gaussian to the approximation of P(X | C j ).
18 G1,w1 G2,w2 G3,w3 G4,w4 G5.w5 Class 1 Variables: μ i, V i, w k We use EM (estimate-maximize) algorithm to approximate this variables. Composition of Gaussian Mixture
19 Gaussian Mixture These parameters are tuned using a complex iterative procedure called the estimate-maximize (EM) algorithm, that aims at maximizing the likelihood of the training set generated by the estimated PDF. The likelihood function L for each class j can be defined as:
20 Gaussian Mixture Training Flow Chart (1) Initialize the initial Gaussian means μ i, i=1,…G using the K means clustering algorithm Initialize the covariance matrices, V i, to the distance to the nearest cluster. Initialize the weights π i =1 / G so that all Gaussian are equally likely. Present each pattern X of the training set and model each of the classes K as a weighte sum of Gaussians: Where G is the number of Gaussians, the π i ’s are the weights, and Where V i is the covariance matrix.
21 Gaussian Mixture Training Flow Chart (2) Compute: Iteratively update the weights, means and covariances:
22 Gaussian Mixture Training Flow Chart (3) Recompute τ ip using the new weights, means and covariances. Stop training if Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.
23 Gaussian Mixture Test Flow Chart Present each input pattern X and compute the confidence for each class j: Where is the prior probability of class C j estimated by counting the number of training patterns. Classify pattern X as the class with the highest confidence. Gaussian Mixture End