Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested.

Similar presentations


Presentation on theme: "DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested."— Presentation transcript:

1 DATA MINING CLUSTERING ANALYSIS

2 Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested in clustering of balls of the three different colours into three different groups. The balls of same colour are clustered into a group as shown below : Concept Definition (Cluster, Cluster analysis)

3 Data Mining (by R.S.K. Baber) 3 CLUSTERING Which is a good cluster? Data structures in data mining / clustering Types of data in cluster analysis Types of clustering K-means:  Concept  Algorithm  Example  Comments

4 Data Mining (by R.S.K. Baber) 4 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center

5 Data Mining (by R.S.K. Baber) 5 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

6 Data Mining (by R.S.K. Baber) 6 The K-Means Clustering Method Example K=2 Arbitrarily choose K object as initial cluster center Assign each objects to most similar center Update the cluster means reassign

7 Data Mining (by R.S.K. Baber) 7 The K-Means Clustering Method How it works?  Suppose, we have 8 points A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9) and 3 clusters initially.  Initial cluster centers are A1(2, 10), A4(5, 8) and A7(1, 2).  Distance function between two points a=(x1, y1) and b=(x2, y2) is d(a, b) = |x2 – x1| + |y2 – y1|.

8 Data Mining (by R.S.K. Baber) 8 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

9 Data Mining (by R.S.K. Baber) 9 The K-Means Clustering Method Iteration # 1:  Pointmean1  x1, y1x2, y2  (2, 10) (2, 10)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean1) = |x2 – x1| + |y2 – y1| = |2 – 2| + |10 – 10| = = 0  Pointmean2  x1, y1x2, y2  (2, 10) (5, 8)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |5 – 2| + |8 – 10| = = 5  Pointmean3  x1, y1x2, y2  (2, 10) (1, 2)  ρ(a, b) = |x2 – x1| + |y2 – y1|  ρ(point, mean2) = |x2 – x1| + |y2 – y1| = |1 – 2| + |2 – 10| = = 9

10 Data Mining (by R.S.K. Baber) 10 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9)

11 Data Mining (by R.S.K. Baber) 11 The K-Means Clustering Method Iteration # 1: (2, 10) (5, 8) (1, 2) PointDist Mean 1Dist Mean 2Dist Mean 3Cluster A1(2, 10)0591 A2(2, 5)5643 A3(8, 4)12792 A4(5, 8)50102 A5(7, 5)10592 A6(6, 4)10572 A7(1, 2)91003 A8(4, 9)32102

12 Data Mining (by R.S.K. Baber) 12 The K-Means Clustering Method Iteration # 1: New clusters:  Cluster 1: (2, 10)  Cluster 2: (8, 4) (5, 8) (7, 5) (6, 4) (4, 9)  Cluster 3: (2, 5) (1, 2) New means:  For Cluster 1, we only have one point A1(2, 10), which was the old mean, so the cluster center remains the same.  Cluster 2: ( ( )/5, ( )/5 ) = (6, 6)  Cluster 3: ( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)

13 Data Mining (by R.S.K. Baber) 13 The K-Means Clustering Method After Iteration 1:

14 Data Mining (by R.S.K. Baber) 14 The K-Means Clustering Method After Iteration 2 & 3:

15 Data Mining (by R.S.K. Baber) 15


Download ppt "DATA MINING CLUSTERING ANALYSIS. Data Mining (by R.S.K. Baber) 2 CLUSTERING Example: suppose we have 9 balls of three different colours. We are interested."

Similar presentations


Ads by Google