Download presentation

Presentation is loading. Please wait.

Published byIngrid Cawood Modified over 2 years ago

1
DATA MINING CLUSTERING ANALYSIS

2
Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested in clustering of balls of the three different colours into three different groups. The balls of same colour are clustered into a group as shown below : Concept Definition (Cluster, Cluster analysis)

3
Data Mining (by R.S.K. Baber) 3 CLUSTERING Which is a good cluster? Data structures in data mining / clustering Types of data in cluster analysis Types of clustering K-means: Concept Algorithm Example Comments

4
Data Mining (by R.S.K. Baber) 4 The K-Means Clustering Method Example 0 1 2 3 4 5 6 7 8 9 10 0123456789 K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center

5
Data Mining (by R.S.K. Baber) 5 The K-Means Clustering Method Example 0 1 2 3 4 5 6 7 8 9 10 0123456789 0 1 2 3 4 5 6 7 8 9 0123456789 K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

6
Data Mining (by R.S.K. Baber) 6 The K-Means Clustering Method Example 0 1 2 3 4 5 6 7 8 9 10 0123456789 0 1 2 3 4 5 6 7 8 9 0123456789 K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

7
Data Mining (by R.S.K. Baber) 7 The K-Means Clustering Method How it works? Suppose, we have 8 points A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9) and 3 clusters initially. Initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2). Distance function between two points a=(x1, y1) and b=(x2, y2) is d(a, b) = |x2 – x1| + |y2 – y1|.

8
Data Mining (by R.S.K. Baber) 8 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

9
Data Mining (by R.S.K. Baber) 9 The K-Means Clustering Method Iteration # 1: Pointmean1 x1, y1x2, y2 (2, 10) (2, 10) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |10 – 10| = 0 + 0 = 0 Pointmean2 x1, y1x2, y2 (2, 10) (5, 8) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = 3 + 2 = 5 Pointmean3 x1, y1x2, y2 (2, 10) (1, 2) ρ(a, b) = |x2 – x1| + |y2 – y1| ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = 1 + 8 = 9

10
Data Mining (by R.S.K. Baber) 10 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

11
Data Mining (by R.S.K. Baber) 11 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5)5643 A3(8, 4)12792 A4(5, 8)50102 A5(7, 5)10592 A6(6, 4)10572 A7(1, 2)91003 A8(4, 9)32102

12
Data Mining (by R.S.K. Baber) 12 The K-Means Clustering Method Iteration # 1: New clusters: Cluster 1: (2, 10) Cluster 2: (8, 4) (5, 8) (7, 5) (6, 4) (4, 9) Cluster 3: (2, 5) (1, 2) New means: For Cluster 1, we only have one point A1(2, 10), which was the old mean, so the cluster center remains the same. Cluster 2: ( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6) Cluster 3: ( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)

13
Data Mining (by R.S.K. Baber) 13 The K-Means Clustering Method After Iteration 1:

14
Data Mining (by R.S.K. Baber) 14 The K-Means Clustering Method After Iteration 2 & 3:

15
Data Mining (by R.S.K. Baber) 15

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google