Clustering (3) Center-based algorithms Fuzzy k-means

Clustering (3) Center-based algorithms Fuzzy k-means
Self-organizing maps Evaluation of clustering results Figures and equations from Data Clustering by Gan et al.

Center-based clustering
Have objective functions which define how good a solution is; The goal is to minimize the objective function; Efficient for large/high dimensional datasets; The clusters are assumed to be convex shaped; The cluster center is representative of the cluster; Some model-based clustering, e.g. Gaussian mixtures, are center-based clustering.

K-means clustering. Let C1, C2,…,Ck be k disjoint clusters. Error is defined as the sum of the distance from the cluster center

The k-means algorithm:

Understanding k-means as an optimization procedure: The objective function is: Minimize the P(W,Q) subject to:

The solution is iteratively solving two sub-problems:

In terms of optimization, the k-means procedure is greedy. Every iteration decreases the value of the objective function; The algorithm converges to a local minimum after a finite number of iterations. Results depend on initiation values. The computational complexity is proportional to the size of the dataset  efficient on large data. The clusters identified are mostly ball-shaped. Works only on numerical data.

A variant of k-means to save computing time: the compare-means algorithm. (There are many.) Based on triangle inequality, d(x, mi)+d(x, mj)≥d(mi, mj) d(x, mj)≥d(mi, mj)-d(x, mi) If d(mi, mj)≥2d(x, mi), then d(x, mj)≥d(x, mi) In every iteration, the small number of between-mean distances are first computed. Then for every x, first compare its distance to the closest known mean with the between-mean distances, to find which of the d(x, mj) really need to be compute.

Automated selection of k? The x-means algorithm based on AIC/BIC. A family of models at different k: is the likelihood of the data given the jth model. pj is the number of parameters. We have to assume a model to get the likelihood. The convenient one is Gaussian.

Under the assumption of identical spherical Gaussian assumption, (n is sample size; k is number of centroids) μ(i) is the centroid associated with xi. The likelihood is: The number of parameters is (d is dimension): (class probabilities + parameters for mean & variance)

K-harmonic means --- insensitive to initiation. K-means error: K-harmonic means error:

K-modes algorithm for categorical data. Let x be a d-vector with categorical attributes. For a group of x’s, the mode is defined as the vector q that minimizes Where The objective function is similar to the one for the original k-means.

K-prototypes algorithm for mixed type data. Between any two points, the distance is defined: γ is a parameter to balance between continuous and categorical variables. Cost function to minimize:

Fuzzy k-means Soft clustering --- an observation can be assigned to multiple clusters. With n samples and c partitions, the fuzzy c-partition matrix (c × n): If take max for every sample we get back to hard partition:

The objective function is:
Fuzzy k-means The objective function is: q>1, it controls the “fuzziness”. Vi is the centroid of cluster i, uij is the degree of membership of xj belonging to cluster i, k is number of clusters.

Fuzzy k-means

Self-organizing maps a constrained version of K-means clustering the prototypes are encouraged to lie in a one- or two-dimensional manifold in the feature space – “a constrained topological map”

Self-organizing maps Set up a two-dimensional rectangular grid of K prototypes mj∈ Rp (usually on the two-dimensional principal component plane) Loop for observation data points xi - find the closest prototype mj to xi in Euclidean distance - for all neighbors mk of mj (within distance r in the 2D grid), move mk toward xi via the update Once the model is fit, the observations are mapped down onto the two-dimensional grid.

Self-organizing maps SOM moves the prototypes closer to the data, but also to maintain a smooth two-dimensional spatial relationship between the prototypes - a constrained version of K-means clustering If r is small enough, SOM becomes K means, training on one data point at a time. Both r and α decrease over iterations.

Self-organizing maps 5 × 5 grid of prototypes

Self-organizing maps

Self-organizing maps Is the constraint reasonable?

Evaluation of clustering results

For all pairs of samples,
Evaluation External criteria approach: Comparing clustering results (C ) with a pre-specified partition (P). For all pairs of samples, M=a+b+c+d In same cluster in P In different cluster in P In same cluster in C a b In different cluster in C c d

Evaluation Monte Carlo methods based on H0 (random generation), or bootstrap are needed to find significance.

Evaluation External criteria: An alternative is to compare the proximity matrix Q with the given partition P. Define matrix Y based on P:

Evaluation Internal criteria: evaluate clustering structure by features of the dataset (mostly proximity matrix of the data). Example: For Hierarchical clustering, Pc: cophenetic matrix, the ijth element represents proximity level at which two data points xi and xj are first joined into the same cluster. P: proximity matrix.

Cophenetic correlation coefficient index:
Evaluation Cophenetic correlation coefficient index: CPCC is in [-1,1]. Higher value indicates better agreement.

Evaluation Relative criteria: choose the best result out of a set according to predefined criterion. Example: Modified Hubert’s Γ statistic: P is the proximity matrix of the data. High value indicates compact clusters.

Clustering (3) Center-based algorithms Fuzzy k-means

Similar presentations

Presentation on theme: "Clustering (3) Center-based algorithms Fuzzy k-means"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering (3) Center-based algorithms Fuzzy k-means

Similar presentations

Presentation on theme: "Clustering (3) Center-based algorithms Fuzzy k-means"— Presentation transcript:

Similar presentations

About project

Feedback