Download presentation

Presentation is loading. Please wait.

Published byArmani Tams Modified about 1 year ago

1
1 Clustering

2
2 Partitioning methods Hierarchical methods Density-based Grid-based Model-based Clustering Techniques

3
3 Data Matrix x 11 … x 1f … x 1p... x i1 … x jf … x ip... x j1 … x jf … x jp Dissimilarity Matrix 0 d (2,1) 0 d (3,1) d (3,2) 0... d (n,1) d (n,2) … 0 d(i, j) – difference or dissimilarity between objects Types of Data

4
Interval Scaled Variables Standardization mean absolute deviation s f s f = 1/n ( |x 1f - m f |+ |x 2f - m f | + … + |x nf - m f | ) standardized measurement z if = x 1f - m f / s f Compute dissimilarity between objects Euclidean distance d(i, j) = √ |x i1 - x j1 | 2 + | x i2 - x j1 | 2 + … + | x ip - x jp | 2 Manhattan (city-block) distance d(i, j) = |x i1 - x j1 |+ | x i2 - x j1 | + … + | x ip - x jp | Minkowski distance d(i, j) = ( |x i1 - x j1 | p + | x i2 - x j1 | p + … + | x ip - x jp | p ) 1/p

5
Binary Variables There are only two states: 0 (absent) or 1 (present). Ex. smoker: yes or no. Computing dissimilarity between binary variables: Dissimilarity matrix (contingency table) if all attributes have the same weight Object i 10Sum 1qrq+r 0sts+t Sumq+sr+tp d(i, j) = r+s / q+r+s asymmetric attributes d(i, j) = r+s / q+r+s+t symmetric attributes

6
Nominal Variables Generalization of binary variable where it can take more than two states. Ex. color: red, green, blue. d(i, j) = p - m / p m – number of matches p – total number of attributes Weights can be used: assign greater weight to the matches in variables having a larger number of states. Ordinal Variables Resemble nominal variables except the states are ordered in meaningful sequence. Ex. medal: gold, silver, bronze. Replace x if by r if {1, …, M f } The value of f for the ith object is x if, and f has M f ordered states, representing the ranking 1, …, M f. Replace each x if by its corresponding rank.

7
Variables of Mixed Types p (f) (f) ij d ij f=1 d(i, j) = p (f) ij f=1 (f) where the indicator ij = 0 if either x if or x jf is missing, or x if = x jf = 0 (f) and variable f is asymmetric binary; otherwise ij = 1. The contribution of variable f to the dissimilarity is dependent on its type: (f) (f) If f is binary or nominal: d ij = 0 if x if = x if ; otherwise d ij = 1. (f) |x if - x jf | If f is interval-based: d ij =, where h runs max h x hf – mix h x hf over all non missing objects for variable f. If f is ordinal or ratio-scaled: compute the ranks r if and r if -1 z if = M f - 1 and treat z if as interval-scaled.

8
1. Partitioning (k-number of clusters) 2. Hierarchical (hierarchical decomposition of objects) TV – trees of order k Given: set of N - vectors Goal:divide these points into maximum I disjoint clusters so that points in each cluster are similar with respect to maximal number of coordinates (called active dimensions). TV-tree of order 2: (two clusters per node) Procedure: Divide set of N points into Z clusters maximizing the total number of active dimensions. For each cluster repeat the same procedure. Density-based methods Can find clusters of arbitrary shape. Can grow (given cluster) as long as density in the neighborhood exceeds some threshold (for each point, neighborhood of given radius contains minimum some number of points). Clustering Methods

9
Partitioning methods 1. K-means method (n objects to k clusters) Cluster similarity measured in regard to mean value of objects ina cluster (cluster’s center of gravity) Select randomly k-points (call them means) Assign each object to nearest mean Compute new mean for each cluster Repeat until criterion function converges K E = | p - m i | 2 i=1 p Ci This method is sensitive to outliers. 2. K-medoids method Instead of mean, take a medoid (most centrally located object in a cluster) Squared error criterion We try to minimize

10
Hierarchical Methods Agglomerative hierarchical clustering (bottom-up strategy) Each object placed in a separate cluster, and then we merge these clusters until certain termination conditions are satisfied. Divisive hierarchical clustering (top-down strategy) Distance between clusters: Minimum distance:d min (C i, C j ) = min p Ci, p’ Cj | p – p’ | Maximum distance:d max (C i, C j ) = max p Ci, p’ Cj | p – p’ | Mean distance:d mean (C i, C j ) = | m i – m j | Average distance:d avg (C i, C j ) = 1/n i n j p Ci p’ Cj | p – p’ |

11
Cluster: K m = {t m1, …, t mn } N Centroid:C m = t mi / N i=1 N Radius:R m = (t mi - C m ) 2 / N i=1 N Diameter: D m = (t mi - t mj ) 2 / N (N-1) i=1 j=1 Distance Between Clusters Single Link (smallest distance) Dis(K i, K j ) = min{Dis(t i, t j ) : t i K i, t j K j } Complete Link (largest distance) Dis(K i, K j ) = max{Dis(t i, t j ) : t i K i, t j K j } Average Dis(K i, K j ) = mean{Dis(t i, t j ) : t i K i, t j K j } Centroid Distance Dis(K i, K j ) = Dis(C i, C j ) Cluster 1 Cluster 2

12
Hierarchical Algorithms Single Link Technique (find maximal connected components in a graph) Distances (threshold) AB CD E AB CD 1 1 AB CD Dendogram ABCDE Threshold level

13
Complete Link Technique (looks for cliques – maximal graphs in which there is an edge between any two vertices) Distances (threshold) 1234 …. AB CD 1 1 E B C A D 1 1 E 3 Dendogram EABDC (5, {EABCD}) (3, {EAB}, {DC}) (1, {AB}, {DC}, {E}) (0, {E}, {A}, {B}, {C}, {D})

14
Partitioning Algorithms Minimum Spanning Tree (MST) Given: n – points k – clusters Algorithm: Start with complete graph Remove largest inconsistent edge (its weight is much larger than average weight of all adjacent edges) Repeat

15
Squared Error Cluster: K i = {t i1, …, t in } Center of cluster: C i N Squared Error:SE Ki = ||t ij – C i || 2 j=1 Collection of clusters:K = {K 1, …, K k } k Squared Error for K:SE k = SE Ki i=1 Given:k – number of clusters threshold Algorithm: Repeat Choose k points randomly (called centers) Assign each item to the cluster which has the closest center Calculate new center for each cluster Calculate squared error Until Difference between old error and new one is below specified threshold

16
CURE (Clustering Using Representatives) Idea: handling clusters of different shapes Algorithm: Constant number of points are chosen from each cluster These points are shrunk toward the cluster’s centroid Clusters with closest pair of representative points are merged Center

17
Examples related to clustering

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google