K Means Clustering , Nearest Cluster and Gaussian Mixture

Slides:



Advertisements
Similar presentations
Clustering II.
Advertisements

Unsupervised Learning
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Unsupervised Learning with Artificial Neural Networks The ANN is given a set of patterns, P, from space, S, but little/no information about their classification,
Pattern recognition Professor Aly A. Farag
Clustering II.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Scalable Data Mining The Auton Lab, Carnegie Mellon University Brigham Anderson, Andrew Moore, Dan Pelleg, Alex Gray, Bob Nichols, Andy.
Clustering.
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Cluster Analysis (1).
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ECE 5984: Introduction to Machine Learning
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Clustering Unsupervised learning Generating “classes”
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Principles of Pattern Recognition
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
1 Statistical Techniques Chapter Linear Regression Analysis Simple Linear Regression.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
HMM - Part 2 The EM algorithm Continuous density HMM.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Machine Learning Queens College Lecture 7: Clustering.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Multivariate statistical methods Cluster analysis.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Multivariate statistical methods
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
Classification of unlabeled data:
Clustering (3) Center-based algorithms Fuzzy k-means
K Nearest Neighbor Classification
DataMining, Morgan Kaufmann, p Mining Lab. 김완섭 2004년 10월 27일
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
EM Algorithm and its Applications
Automatic dog’s excrement detection system
Presentation transcript:

K Means Clustering , Nearest Cluster and Gaussian Mixture Good morning, everyone. Today I will introduce three kinds of nonparametric Classifiers in sequence: they include K N G. Presented by Kuei-Hsien 2005.06.23

K Means Clustering Clustering algorithms are used to find groups of “similar” data points among the input patterns. K means clustering is an effective algorithm to extract a given number of clusters of patterns from a training set. Once done, the cluster locations can be used to classify data into distinct classes. The first algorism I introduce is K.

K Means Training Flow Chart Initialize the number of cluster centers selected by the user by randomly selecting them from the training set. Classify the entire training set. For each pattern Xi in the training set, find the nearest cluster center C* and classify Xi as a member of C*. For each cluster, recompute its center by finding the mean of the cluster: Where Mk is the new mean, Nk is the number of training patterns in cluster k, and Xjk is the k-th pattern belonging to cluster . How to train the parameter of the K Means clustering? We could follow the training flow chart. First …sum whole data which in cluster k. then we use the new center re-classify the entire training set. And recompute its center again… loop until…. Loop until the change in cluster means is less the amount specified by the user.

Store the k cluster centers. If the number of cluster centers is less then the number specified, split each cluster center into two clusters by finding the input dimension with the highest deviation: Where Xij is the i-th dimension of the j-th pattern in cluster k, Mij is the i-th dimension of the cluster center, and Nk is the number of training patterns in cluster k. Sigma I equal to sum of the square of difference between Xi and mean. Store the k cluster centers.

Ask User how many clusters they’d like. (e.g. k=5). Random select k cluster centers locations Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints) Each cluster fine the centroid of the points it owns. According to the new center, we can re-classify the entire training set. For each data in the training set, find the new nearest cluster center and classify data as a member of new cluster. …and jumps to there …repeat step 3 to 5 until terminated

Stop Condition-Splitting & Merging SD is large and SD change is less than a specified value The means of clusters are too close Merging In this instance, we may find one of the clusters which have a greatest SD and maybe SD change is less than a specified value, we could consider to separate this cluster into two partition. Or we may find that two means of the clusters are too close, we could consider to merge them. The criteria of splitting is try to find the greatest sigma-i in the clusters. The criteria of merging is try to find the shortest distance between two cluster centers. Loop until the change of SD in all clusters are less than a specified value by user, or when a specified number of epochs have been reached.

Stop Condition- training of parameter Euclidean distance: Loop until the change in all cluster means is less the amount specified by the user K Means Test Flow Chart Just like the linear distance between tow point. We could use the distance to belong the data X to distinct cluster. For each pattern X, associate X with the cluster Y closest to X using the Euclidean distance:

Commonly tunable parameters for K means Number of initial clusters Randomly chosen Number of cluster centers 2~√N Criteria for splitting and merging cluster centers Number of epochs or percent of SD change Stop conditions The stop condition Binary splitting is less important than that for the overall clustering. The number of cluster centers to begin with, randomly chosen from the training set. This number of clusters to use is highly problem dependent. We often estimate that from 2 to radical the number of training set. Typical stop conditions are a percent SD change threshold and maximum number of epochs of training. The stop condition for each binary split is less important than the stop condition for the overall clustering. Therefore, if the training set is large, pick a faster binary-split stopping condition and use a slower stopping condition for the overall clustering. K Means Clustering End

Nearest Cluster The nearest-cluster architecture can be viewed as a condensed version of the K nearest neighbor architecture. This architecture can often deliver performance close to that of KNN, while reducing computation time and memory requirements.

Nearest Cluster The nearest-cluster architecture involves a partitioning of the training set into a few clusters. For a given cluster, these values estimate the posterior probabilities of all possible classes for the region of the input space in the vicinity of the cluster. During classification, an input is associated with the nearest cluster, and the posterior probability estimates for that cluster are used to classify the input.

Nearest Cluster Training Flow Chart Perform K means clustering on the data set. For each cluster, generate a probability for each class according to: Where Pjk is the probability for class j within cluster k, Njk is the number of class-j patterns belonging to cluster k, and Nk is the total number of patterns belonging to cluster k.

Nearest Cluster Test Flow Chart For each input pattern, X, find the nearest cluster Ck, using the Euclidean distance measure: Where Y is a cluster center and m is the number of dimensions in the input patterns. Use the probabilities Pjk for all classes j stored with Ck., and classify pattern X into the class j with the highest probability.

Nearest Cluster Class 1 Class 2 h In cluster 1: In cluster 2: Pclass1=100% Pclass2=0% In cluster 2: Pclass1=99.5% Pclass2=0.5% In cluster 3: Pclass1=50% Pclass2=50% Class 2 In cluster 5: Pclass1=65% Pclass2=35% In cluster 4: Pclass1=15% Pclass2=85% Pclass1=65% Nearest Cluster End

Gaussian Mixture The Gaussian mixture architecture estimates probability density functions (PDF) for each class, and then performs classification based on Bayes’ rule: Where P(X | Ci) is the PDF of class j, evaluated at X, P( Cj ) is the prior probability for class j, and P(X) is the overall PDF, evaluated at X.

Gaussian Mixture Unlike the unimodal Gaussian architecture, which assumes P(X | Cj) to be in the form of a Gaussian, the Gaussian mixture model estimates P(X | Cj) as a weighted average of multiple Gaussians. Where wk is the weight of the k-th Gaussian Gk and the weights sum to one. One such PDF model is produced for each class.

Gaussian Mixture Each Gaussian component is defined as: Where Mk is the mean of the Gaussian and Vk is the covariance matrix of the Gaussian..

Gaussian Mixture Free parameters of the Gaussian mixture model consist of the means and covariance matrices of the Gaussian components and the weights indicating the contribution of each Gaussian to the approximation of P(X | Cj).

Composition of Gaussian Mixture Class 1 G1,w1 G2,w2 G3,w3 G4,w4 G5.w5 Variables: μi, Vi, wk We use EM (estimate-maximize) algorithm to approximate this variables.

Gaussian Mixture These parameters are tuned using a complex iterative procedure called the estimate-maximize (EM) algorithm, that aims at maximizing the likelihood of the training set generated by the estimated PDF. The likelihood function L for each class j can be defined as:

Gaussian Mixture Training Flow Chart (1) Initialize the initial Gaussian means μi, i=1,…G using the K means clustering algorithm Initialize the covariance matrices, Vi, to the distance to the nearest cluster. Initialize the weights πi =1 / G so that all Gaussian are equally likely. Present each pattern X of the training set and model each of the classes K as a weighte sum of Gaussians: Where G is the number of Gaussians, the πi’s are the weights, and Where Vi is the covariance matrix.

Gaussian Mixture Training Flow Chart (2) Compute: Iteratively update the weights, means and covariances: We compute the Tao-ip which define as probability of cluster-I given X, equal to the formula. The denominator is probability of X in the class, and the numerator is the PDF of cluster-I multiply by it’s weight. After we got the Tao-ip, use Tao to estimate new weights, means and covariance. And then, we use the new mean, weight, and covariance to estimate new Tao

Gaussian Mixture Training Flow Chart (3) Recompute τip using the new weights, means and covariances. Stop training if Or the number of epochs reach the specified value. Otherwise, continue the iterative updates.

Gaussian Mixture Test Flow Chart Present each input pattern X and compute the confidence for each class j: Where is the prior probability of class Cj estimated by counting the number of training patterns. Classify pattern X as the class with the highest confidence. The PDF of each class multiply by the probability of class-j. P equal to number of data in class j Gaussian Mixture End

Thanks for your attention !!