Presentation is loading. Please wait.

Presentation is loading. Please wait.

K- Medoids Clustering: Partition Around Medoids Algorithm(PAM) An Example & Implementation using Python — by Sanjay Goswami — Stay Home, Stay Safe.

Similar presentations


Presentation on theme: "K- Medoids Clustering: Partition Around Medoids Algorithm(PAM) An Example & Implementation using Python — by Sanjay Goswami — Stay Home, Stay Safe."— Presentation transcript:

1 K- Medoids Clustering: Partition Around Medoids Algorithm(PAM) An Example & Implementation using Python — by Sanjay Goswami — Stay Home, Stay Safe ©2020, sanjay goswami, 2020

2 Example: Suppose that the data mining task to cluster points with (x, y) representing location into three clusters (k = 3), where the points are: A1(2, 10), A2(2, 5), A3(8, 4), B1(5, 8), B2(7, 5), B3(6, 4), C1(1, 2) and C2(4, 9). The distance function is Eucledian function. Suppose initially, we assign A1, B1 and C1 as the centre of each cluster respectively. Use the k-medoids clustering to cluster data points into three cluster. sanjay goswami, 2020

3 Solution (continue): A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5 8.48 - 7.07 7.21
Randomly choose 3 points (because no. of clusters = 3) as initial clusters centers or medoids or representative points. In our example A1, B1, and C1 are given. Suppose clusters represented as K1, K2, and K3. Iteration 1: Now find the distance of each non-medoid points to these medoids, then distances are- (Here, Eucledian function is used) d(A1, A3) = sqrt( (2 - 8)2 + (10 - 4)2 ) = sqrt( ) = sqrt ( 72 ) = 8.48 Similarly d(A1, C2) = sqrt( (2 - 4)2 + (10 - 9)2 ) = sqrt( ) = sqrt ( 5 ) = 2.23 And so on….. A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5 8.48 - 7.07 7.21 2.23 K2(B1) 4.24 3.60 4.12 1.41 K3(C1) 3.16 7.28 6.70 5.38 7.61 sanjay goswami, 2020

4 Solution (continue): A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5 8.48 - 7.07 7.21
Now find the minimum distances of each non-medoids points to medoids points in each column: (here ‘0’ and ‘-’ not consider because these distances are between medoid points to either itself or other medoid points) Clearly shown, here points A3, B2, B3 and C2 belong to the cluster K2 with B1 as medoid or centre, because A3, B2, B3 and C2 are more closer to B1. Similarly points A2 belongs to the cluster K3 with C1 as medoid or centre. Point A1 belongs to the cluster K1 alone as a medoid. K K K3 A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5 8.48 - 7.07 7.21 2.23 K2(B1) 4.24 3.60 4.12 1.41 K3(C1) 3.16 7.28 6.70 5.38 7.61 B1+ B2, B3, C2, A3 C1+ A2 A1+ sanjay goswami, 2020

5 Solution (continue): A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5.0 8.48 3.60 7.07
Finally at the last iteration-1 calculate the absolute error or cost, such as: E1 = (d(B1, A3) + d(B1, B2) + d(B1, B3) + d(B1, C2)) + d(C1, A2) = ( ) + (3.16) = (from table) (1) Iteration 2: In the next iteration, the algo selects new medoids in place of existing medoids. Now randomly select one non-medoid point in each iteration from any cluster, make it new medoid point of that cluster and recalculate the cost. Let the randomly selected point be B3(6, 4) from cluster K2. The dissimilarity of each non-medoid points with the medoids A1 (2, 10), B3(6, 4) and C1 (1, 2) is calculated and tabulated as: d(B3, A3) = sqrt( (6 - 8)2 + (4 - 4)2 ) = sqrt( ) = sqrt ( 4 ) = 2.0 Similarly all….. A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5.0 8.48 3.60 7.07 - 2.23 K2(B3) 4.12 2.0 1.41 5.38 K3(C1) 3.16 7.21 6.70 7.61 sanjay goswami, 2020

6 Solution (continue): A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5.0 8.48 3.60 7.07
Each point is assigned to that cluster whose dissimilarity is less. (here ‘0’ and ‘-’ not consider because these distances are between medoid points to either itself or other medoid points) So, the points B1, C2 go to cluster K1 with A1 as medoid of cluster and the points A3, B2 go to cluster K2 with B3 as medoid. The cluster K3 contains the points A2 and C1(as medoid point). A1 A2 A3 B1 B2 B3 C1 C2 K1(A1) 5.0 8.48 3.60 7.07 - 2.23 K2(B3) 4.12 2.0 1.41 5.38 K3(C1) 3.16 7.28 7.21 6.70 7.61 sanjay goswami, 2020

7 Solution (continue): The cost or absolute error S2 = ( ) + ( ) + (3.16) = (2) Present Cost – Previous Cost = 12.4 – = -4.89 < 0. or as the swap cost or absolute error is less than zero or reduce, so swap of medoid is considerable. Hence A1(2, 10), B3(6, 4) and C1(1, 2) are the final medoids. Now clusters are: K K K3 We stop here for simplicity, but the clustering would be in the following way has to continue for more iterations until replace each non-medoid points with medoids points and check weather the swap is considerable or not. B2, B3+, A3 A1+, B1, C2 C1+ A2 sanjay goswami, 2020

8 Implementation using Python, Scikit, Numpy… and Screeshots

9

10

11


Download ppt "K- Medoids Clustering: Partition Around Medoids Algorithm(PAM) An Example & Implementation using Python — by Sanjay Goswami — Stay Home, Stay Safe."

Similar presentations


Ads by Google