Download presentation

Presentation is loading. Please wait.

Published byToni Bonny Modified over 3 years ago

1
SEEM4630 2011-2012 Tutorial 4 – Clustering

2
2 What is Cluster Analysis? Finding groups of objects such that the objects in a group will be similar (or related to one another and different from (or unrelated to) the objects in other groups. A good clustering method will produce high quality clusters high intra-class similarity: cohesive within clusters low inter-class similarity: distinctive between clusters

3
3 Notion of a Cluster can be Ambiguous How many clusters? Four ClustersTwo Clusters Six Clusters

4
4 K-Means Clusteringfixed Euclidean Distance etc.

5
5 K-Means Clustering: Example Given: Means of the cluster k i, m i = (t i1 + t i2 + … + t im )/m Data {2, 4, 10, 12, 3, 20, 30, 11, 25} K = 2 Solution: m 1 = 2, m 2 = 4, K 1 = {2, 3}, and K 2 = {4, 10, 12, 20, 30, 11, 25} m 1 = 2.5, m 2 = 16 K 1 = {2, 3, 4}, and K 2 = {10, 12, 20, 30, 11, 25} m 1 = 3, m 2 = 18 K 1 = {2, 3, 4, 10}, and K 2 = {12, 20, 30, 11, 25} m 1 = 4.75, m 2 = 19.6 K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25} m 1 = 7, m 2 = 25 K 1 = {2, 3, 4, 10, 11, 12}, and K 2 = {20, 30, 25}

6
6 K-Means Clustering: Evaluation Evaluation Sum of Squared Error (SSE) Given clusters, choose the one with the smallest error Data point in cluster C i Centroid of cluster C i

7
7 Limitations of K-means It is hard to determine a good K value The initial K centroids K-means has problems when the data contains outliers. Outliers can be handled better by hierarchical clustering and density-based clustering

8
8 Hierarchical Clustering Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram A tree like diagram that records the sequences of merges or splits

9
9 Strengths of Hierarchical Clustering Do not have to assume any particular number of clusters Any desired number of clusters can be obtained by ‘cutting’ the dendrogram at the proper level Partition direction Agglomerative: starting with single elements and aggregating them into clusters Divisive: starting with the complete data set and dividing it into partitions

10
10 Agglomerative Hierarchical Clustering Basic algorithm is straightforward 1. Compute the proximity matrix 2. Let each data point be a cluster 3. Repeat 4. Merge the two closest clusters 5. Update the proximity matrix 6. Until only a single cluster remains Key operation is the computation of the proximity of two clusters Different approaches to define the distance between clusters

11
11 Hierarchical Clustering Define Inter-Cluster Similarity Min Max Group Average Distance between Centroids

12
12 Hierarchical Clustering: Min or Single Link I1I2I3I4I5 I10.000.240.220.370.34 I20.240.000.150.200.14 I30.220.150.000.150.28 I40.370.200.150.000.29 I50.340.140.280.290.00 I6 0.23 0.250.110.220.39 0.23 0.25 0.11 0.22 0.39 0.00 362541 0 0.05 0.1 0.15 0.2 I1I2{I3, I6}I4I5 I10.000.240.220.370.34 I20.240.000.150.200.14 {I3, I6}0.220.150.000.150.28 I40.370.200.150.000.29 I50.340.140.280.290.00 I1{I2, I5}{I3, I6}I4 I10.000.240.220.37 {I2, I5}0.240.000.150.20 {I3, I6}0.220.150.000.15 I40.370.200.150.00 I1{I2, I5,I3, I6}I4 I10.000.220.37 {I2, I5, I3, I6} {I4} 0.220.000.15 0.370.150.00 I1{I2, I5,I3, I6, I4} I10.000.22 {I2, I5, I3, I6, I4} 0.220.00 Euclidean distance

Similar presentations

Presentation is loading. Please wait....

OK

Clustering.

Clustering.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To ensure the functioning of the site, we use **cookies**. We share information about your activities on the site with our partners and Google partners: social networks and companies engaged in advertising and web analytics. For more information, see the Privacy Policy and Google Privacy & Terms.
Your consent to our cookies if you continue to use this website.

Ads by Google

Ppt on global marketing strategies Ppt on recurrent abortion Ppt on design of earthen dam Ppt on campus recruitment training Ppt on air insulated substation Ppt on career planning for mba students Ppt on leverages definition Ppt on online banking management system Ppt on polluted ganga river Ppt on law against child marriage statistics