 # DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested.

## Presentation on theme: "DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested."— Presentation transcript:

DATA MINING CLUSTERING ANALYSIS

Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested in clustering of balls of the three different colours into three different groups. The balls of same colour are clustered into a group as shown below : Concept Definition (Cluster, Cluster analysis)

Data Mining (by R.S.K. Baber) 3 CLUSTERING Which is a good cluster? Data structures in data mining / clustering Types of data in cluster analysis Types of clustering K-means:  Concept  Algorithm  Example  Comments

Data Mining (by R.S.K. Baber) 4 The K-Means Clustering Method Example 0 1 2 3 4 5 6 7 8 9 10 0123456789 K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center

Data Mining (by R.S.K. Baber) 5 The K-Means Clustering Method Example 0 1 2 3 4 5 6 7 8 9 10 0123456789 0 1 2 3 4 5 6 7 8 9 0123456789 K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

Data Mining (by R.S.K. Baber) 6 The K-Means Clustering Method Example 0 1 2 3 4 5 6 7 8 9 10 0123456789 0 1 2 3 4 5 6 7 8 9 0123456789 K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

Data Mining (by R.S.K. Baber) 7 The K-Means Clustering Method How it works?  Suppose, we have 8 points A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9) and 3 clusters initially.  Initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2).  Distance function between two points a=(x1, y1) and b=(x2, y2) is d(a, b) = |x2 – x1| + |y2 – y1|.

Data Mining (by R.S.K. Baber) 8 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

Data Mining (by R.S.K. Baber) 9 The K-Means Clustering Method Iteration # 1:  Pointmean1  x1, y1x2, y2  (2, 10) (2, 10)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |10 – 10| = 0 + 0 = 0  Pointmean2  x1, y1x2, y2  (2, 10) (5, 8)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = 3 + 2 = 5  Pointmean3  x1, y1x2, y2  (2, 10) (1, 2)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = 1 + 8 = 9

Data Mining (by R.S.K. Baber) 10 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

Data Mining (by R.S.K. Baber) 11 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5)5643 A3(8, 4)12792 A4(5, 8)50102 A5(7, 5)10592 A6(6, 4)10572 A7(1, 2)91003 A8(4, 9)32102

Data Mining (by R.S.K. Baber) 12 The K-Means Clustering Method Iteration # 1: New clusters:  Cluster 1: (2, 10)  Cluster 2: (8, 4) (5, 8) (7, 5) (6, 4) (4, 9)  Cluster 3: (2, 5) (1, 2) New means:  For Cluster 1, we only have one point A1(2, 10), which was the old mean, so the cluster center remains the same.  Cluster 2: ( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6)  Cluster 3: ( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)

Data Mining (by R.S.K. Baber) 13 The K-Means Clustering Method After Iteration 1:

Data Mining (by R.S.K. Baber) 14 The K-Means Clustering Method After Iteration 2 & 3:

Data Mining (by R.S.K. Baber) 15

Similar presentations