Presentation is loading. Please wait.

Presentation is loading. Please wait.

SEEM4630 2011-2012 Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or.

Similar presentations


Presentation on theme: "SEEM4630 2011-2012 Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or."— Presentation transcript:

1 SEEM Tutorial 4 – Clustering

2 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or related to one another and different from (or unrelated to) the objects in other groups.  A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters

3 3 Notion of a Cluster can be Ambiguous How many clusters? Four ClustersTwo Clusters Six Clusters

4 4 K-Means Clusteringfixed Euclidean Distance etc.

5 5 K-Means Clustering: Example  Given: Means of the cluster k i, m i = (t i1 + t i2 + … + t im )/m Data {2, 4, 10, 12, 3, 20, 30, 11, 25} K = 2  Solution: m 1 = 2, m 2 = 4,  K 1 = {2, 3}, and K 2 = {4, 10, 12, 20, 30, 11, 25} m 1 = 2.5, m 2 = 16  K 1 = {2, 3, 4}, and K 2 = {10, 12, 20, 30, 11, 25} m 1 = 3, m 2 = 18  K 1 = {2, 3, 4, 10}, and K 2 = {12, 20, 30, 11, 25} m 1 = 4.75, m 2 = 19.6  K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25} m 1 = 7, m 2 = 25  K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25}

6 6 K-Means Clustering: Evaluation  Evaluation Sum of Squared Error (SSE) Given clusters, choose the one with the smallest error Data point in cluster C i Centroid of cluster C i

7 7 Limitations of K-means  It is hard to determine a good K value The initial K centroids  K-means has problems when the data contains outliers. Outliers can be handled better by hierarchical clustering and density-based clustering

8 8 Hierarchical Clustering  Produces a set of nested clusters organized as a hierarchical tree  Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits

9 9 Strengths of Hierarchical Clustering  Do not have to assume any particular number of clusters Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper level  Partition direction Agglomerative: starting with single elements and aggregating them into clusters Divisive: starting with the complete data set and dividing it into partitions

10 10 Agglomerative Hierarchical Clustering  Basic algorithm is straightforward 1. Compute the proximity matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains  Key operation is the computation of the proximity of two clusters Different approaches to define the distance between clusters

11 11 Hierarchical Clustering  Define Inter-Cluster Similarity Min Max Group Average Distance between Centroids

12 12 Hierarchical Clustering: Min or Single Link I1I2I3I4I5 I I I I I I I1I2{I3, I6}I4I5 I I {I3, I6} I I I1{I2, I5}{I3, I6}I4 I {I2, I5} {I3, I6} I I1{I2, I5,I3, I6}I4 I {I2, I5, I3, I6} {I4} I1{I2, I5,I3, I6, I4} I {I2, I5, I3, I6, I4} Euclidean distance


Download ppt "SEEM4630 2011-2012 Tutorial 4 – Clustering. 2 What is Cluster Analysis?  Finding groups of objects such that the objects in a group will be similar (or."

Similar presentations


Ads by Google