Clustering (3) Center-based algorithms Fuzzy k-means

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Self-Organizing Maps Projection of p dimensional observations to a two (or one) dimensional grid space Constraint version of K-means clustering –Prototypes.
Hierarchical Clustering, DBSCAN The EM Algorithm
PARTITIONAL CLUSTERING
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
K Means Clustering , Nearest Cluster and Gaussian Mixture
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Non-linear Dimensionality Reduction CMPUT 466/551 Nilanjan Ray Prepared on materials from the book Non-linear dimensionality reduction By Lee and Verleysen,
Introduction to Bioinformatics
Clustering (1) Clustering Similarity measure Hierarchical clustering Model-based clustering Figures from the book Data Clustering by Gan et al.
Clustering and Dimensionality Reduction Brendan and Yifang April
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Clustering Color/Intensity
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Unsupervised Learning
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
What is Cluster Analysis?
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Presented By Wanchen Lu 2/25/2013
Unsupervised Learning. CS583, Bing Liu, UIC 2 Supervised learning vs. unsupervised learning Supervised learning: discover patterns in the data that relate.
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
Particle Filters for Shape Correspondence Presenter: Jingting Zeng.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
Text Clustering.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
1 CLUSTER VALIDITY  Clustering tendency Facts  Most clustering algorithms impose a clustering structure to the data set X at hand.  However, X may not.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Unsupervised learning  Supervised and Unsupervised learning  General considerations  Clustering  Dimension reduction The lecture is partly based on:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Clustering (2) Center-based algorithms Fuzzy k-means Density-based algorithms ( DBSCAN as an example ) Evaluation of clustering results Figures and equations.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Clustering (1) Clustering Similarity measure Hierarchical clustering
Combinatorial clustering algorithms. Example: K-means clustering
Clustering CSC 600: Data Mining Class 21.
Clustering Usman Roshan.
Data Mining K-means Algorithm
Classification of unlabeled data:
Machine Learning Basics
CSE 5243 Intro. to Data Mining
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
TOP DM 10 Algorithms C4.5 C 4.5 Research Issue:
Hidden Markov Models Part 2: Algorithms
Probabilistic Models with Latent Variables
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Clustering Wei Wang.
Text Categorization Berlin Chen 2003 Reference:
EM Algorithm and its Applications
Clustering (2) & EM algorithm
Introduction to Machine learning
Clustering Usman Roshan CS 675.
Presentation transcript:

Clustering (3) Center-based algorithms Fuzzy k-means Self-organizing maps Evaluation of clustering results Figures and equations from Data Clustering by Gan et al.

Center-based clustering Have objective functions which define how good a solution is; The goal is to minimize the objective function; Efficient for large/high dimensional datasets; The clusters are assumed to be convex shaped; The cluster center is representative of the cluster; Some model-based clustering, e.g. Gaussian mixtures, are center-based clustering.

Center-based clustering K-means clustering. Let C1, C2,…,Ck be k disjoint clusters. Error is defined as the sum of the distance from the cluster center

Center-based clustering The k-means algorithm:

Center-based clustering Understanding k-means as an optimization procedure: The objective function is: Minimize the P(W,Q) subject to:

Center-based clustering The solution is iteratively solving two sub-problems:

Center-based clustering In terms of optimization, the k-means procedure is greedy. Every iteration decreases the value of the objective function; The algorithm converges to a local minimum after a finite number of iterations. Results depend on initiation values. The computational complexity is proportional to the size of the dataset  efficient on large data. The clusters identified are mostly ball-shaped. Works only on numerical data.

Center-based clustering A variant of k-means to save computing time: the compare-means algorithm. (There are many.) Based on triangle inequality, d(x, mi)+d(x, mj)≥d(mi, mj) d(x, mj)≥d(mi, mj)-d(x, mi) If d(mi, mj)≥2d(x, mi), then d(x, mj)≥d(x, mi) In every iteration, the small number of between-mean distances are first computed. Then for every x, first compare its distance to the closest known mean with the between-mean distances, to find which of the d(x, mj) really need to be compute.

Center-based clustering Automated selection of k? The x-means algorithm based on AIC/BIC. A family of models at different k: is the likelihood of the data given the jth model. pj is the number of parameters. We have to assume a model to get the likelihood. The convenient one is Gaussian.

Center-based clustering Under the assumption of identical spherical Gaussian assumption, (n is sample size; k is number of centroids) μ(i) is the centroid associated with xi. The likelihood is: The number of parameters is (d is dimension): (class probabilities + parameters for mean & variance)

Center-based clustering K-harmonic means --- insensitive to initiation. K-means error: K-harmonic means error:

Center-based clustering K-modes algorithm for categorical data. Let x be a d-vector with categorical attributes. For a group of x’s, the mode is defined as the vector q that minimizes Where The objective function is similar to the one for the original k-means.

Center-based clustering K-prototypes algorithm for mixed type data. Between any two points, the distance is defined: γ is a parameter to balance between continuous and categorical variables. Cost function to minimize:

Fuzzy k-means Soft clustering --- an observation can be assigned to multiple clusters. With n samples and c partitions, the fuzzy c-partition matrix (c × n): If take max for every sample we get back to hard partition:

The objective function is: Fuzzy k-means The objective function is: q>1, it controls the “fuzziness”. Vi is the centroid of cluster i, uij is the degree of membership of xj belonging to cluster i, k is number of clusters.

Fuzzy k-means

Self-organizing maps a constrained version of K-means clustering the prototypes are encouraged to lie in a one- or two-dimensional manifold in the feature space – “a constrained topological map”

Self-organizing maps Set up a two-dimensional rectangular grid of K prototypes mj∈ Rp (usually on the two-dimensional principal component plane) Loop for observation data points xi - find the closest prototype mj to xi in Euclidean distance - for all neighbors mk of mj (within distance r in the 2D grid), move mk toward xi via the update Once the model is fit, the observations are mapped down onto the two-dimensional grid.

Self-organizing maps SOM moves the prototypes closer to the data, but also to maintain a smooth two-dimensional spatial relationship between the prototypes - a constrained version of K-means clustering If r is small enough, SOM becomes K means, training on one data point at a time. Both r and α decrease over iterations.

Self-organizing maps 5 × 5 grid of prototypes

Self-organizing maps

Self-organizing maps

Self-organizing maps Is the constraint reasonable?

Evaluation of clustering results

For all pairs of samples, Evaluation External criteria approach: Comparing clustering results (C ) with a pre-specified partition (P). For all pairs of samples, M=a+b+c+d In same cluster in P In different cluster in P In same cluster in C a b In different cluster in C c d

Evaluation Monte Carlo methods based on H0 (random generation), or bootstrap are needed to find significance.

Evaluation External criteria: An alternative is to compare the proximity matrix Q with the given partition P. Define matrix Y based on P:

Evaluation Internal criteria: evaluate clustering structure by features of the dataset (mostly proximity matrix of the data). Example: For Hierarchical clustering, Pc: cophenetic matrix, the ijth element represents proximity level at which two data points xi and xj are first joined into the same cluster. P: proximity matrix.

Cophenetic correlation coefficient index: Evaluation Cophenetic correlation coefficient index: CPCC is in [-1,1]. Higher value indicates better agreement.

Evaluation Relative criteria: choose the best result out of a set according to predefined criterion. Example: Modified Hubert’s Γ statistic: P is the proximity matrix of the data. High value indicates compact clusters.