# Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ

## Presentation on theme: "Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ"— Presentation transcript:

Cluster Algorithms Adriano Joaquim de O Cruz ©2006 UFRJ adriano@nce.ufrj.br

K-means

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 3 K-means algorithm Based on the Euclidean distances among elements of the cluster Based on the Euclidean distances among elements of the cluster Centre of the cluster is the mean value of the objects in the cluster. Centre of the cluster is the mean value of the objects in the cluster. Classifies objects in a hard way. Each object belongs to a single cluster. Classifies objects in a hard way. Each object belongs to a single cluster.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 4 K-means algorithm Consider n (X={x 1, x 2,..., x n }) objects and k clusters. Consider n (X={x 1, x 2,..., x n }) objects and k clusters. Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Consider A a set of k clusters (A={A 1, A 2,..., A k }). Consider A a set of k clusters (A={A 1, A 2,..., A k }).

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 5 K-means properties The union of all clusters makes the Universe The union of all clusters makes the Universe No element belongs to more than one cluster No element belongs to more than one cluster There is no empty cluster There is no empty cluster

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 6 Membership function

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 7 Membership matrix U Matrix containing the values of inclusion of each element into each cluster (0 or 1). Matrix containing the values of inclusion of each element into each cluster (0 or 1). Matrix has c (clusters) lines and n (elements) columns. Matrix has c (clusters) lines and n (elements) columns. The sum of all elements in the column must be equal to one (element belongs only to one cluster The sum of all elements in the column must be equal to one (element belongs only to one cluster The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements. The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 8 Matrix examples X1X2X3 X4X5X6 Two examples of clustering. What do the clusters represent?

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 9 Matrix examples cont. X1X2X3 X4X5X6 U1 and U2 are the same matrices.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 10 How many clusters? The cardinality of any hard k-partition of n elements is The cardinality of any hard k-partition of n elements is

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 11 How many clusters (example)? Consider the matrix U2 (k=3, n=6) Consider the matrix U2 (k=3, n=6)

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 12 K-means inputs and outputs Inputs: the number of clusters c and a database containing n objects with l characteristics each. Inputs: the number of clusters c and a database containing n objects with l characteristics each. Output: A set of k clusters that minimises the square-error criterion. Output: A set of k clusters that minimises the square-error criterion.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 13 Number of Clusters

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 14 K-means algorithm v1 Arbitrarily assigns each object to a cluster (matrix U). Repeat Update the cluster centres; Update the cluster centres; Reassign objects to the clusters to which the objects are most similar; Reassign objects to the clusters to which the objects are most similar; Until no change;

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 15 K-means algorithm v2 Arbitrarily choose c objects as the initial cluster centres. Repeat Reassign objects to the clusters to which the objects are most similar. Reassign objects to the clusters to which the objects are most similar. Update the cluster centres. Until no change

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 16 Algorithm details The algorithm tries to minimise the function The algorithm tries to minimise the function d ie is the distance between the element x e (m characteristics) and the centre of the cluster i (v i ) d ie is the distance between the element x e (m characteristics) and the centre of the cluster i (v i )

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 17 Cluster Centre The centre of the cluster i (v i ) is an l characteristics vector. The centre of the cluster i (v i ) is an l characteristics vector. The jth co-ordinate is calculated as The jth co-ordinate is calculated as

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 18 Detailed Algorithm Choose c (number of clusters). Choose c (number of clusters). Set error ( > 0) and step (r=0). Set error ( > 0) and step (r=0). Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements. Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 19 Detailed Algorithm cont. Repeat Calculate the centre of the clusters v i (r) Calculate the distance d i (r) of each point to the centre of the clusters Generate U (r+1) recalculating all characteristic functions using the equations Until ||U (r+1) -U (r) || < Until ||U (r+1) -U (r) || <

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 20 Matrix norms Consider a matrix U of n lines and n columns: Consider a matrix U of n lines and n columns: Column norm Column norm Line norm Line norm

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 21 K-means problems? Suitable when clusters are compact clouds well separated. Suitable when clusters are compact clouds well separated. Scalable because computational complexity is O(nkr). Scalable because computational complexity is O(nkr). Necessity of choosing c is disadvantage. Necessity of choosing c is disadvantage. Not suitable for nonconvex shapes. Not suitable for nonconvex shapes. It is sensitive to noise and outliers because they influence the means. It is sensitive to noise and outliers because they influence the means. Depends on the initial allocation. Depends on the initial allocation.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 22 Examples of results

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 23 K-means: Actual Data

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 24 K-means: Results

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 25 K-medoids

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 26 K-medoids Algorithm 1

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 27 K-medoids methods K-means is sensitive to outliers since an object with an extremely large value may distort the distribution of data. K-means is sensitive to outliers since an object with an extremely large value may distort the distribution of data. Instead of taking the mean value the most centrally object (medoid) is used as reference point. Instead of taking the mean value the most centrally object (medoid) is used as reference point. The algorithm minimizes the sum of dissimilarities between each object and the medoid (similar to k-means) The algorithm minimizes the sum of dissimilarities between each object and the medoid (similar to k-means)

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 28 K-medoids strategies Find k-medoids arbitrarily. Find k-medoids arbitrarily. Each remaining object is clustered with the medoid to which is the most similar. Each remaining object is clustered with the medoid to which is the most similar. Then iteratively replaces one of the medoids by a non-medoid as long as the quality of the clustering is improved. Then iteratively replaces one of the medoids by a non-medoid as long as the quality of the clustering is improved. The quality is measured using a cost function that measures the average dissimilarity between the objects and the medoid of its cluster. The quality is measured using a cost function that measures the average dissimilarity between the objects and the medoid of its cluster.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 29 Reassignment costs Each time a reassignment occurs a difference in square-error J is contributed. Each time a reassignment occurs a difference in square-error J is contributed. The cost function J calculates the total cost of replacing a current medoid by a non-medoid. The cost function J calculates the total cost of replacing a current medoid by a non-medoid. If the total cost is negative then m j is replaced by m random, otherwise the replacement is not accepted. If the total cost is negative then m j is replaced by m random, otherwise the replacement is not accepted.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 30Source Algorithm presented in: Finding groups in Data: An Introduction to clusters analysis, L. Kaufman and P. J. Rousseeuw, John Wiley & Sons Algorithm presented in: Finding groups in Data: An Introduction to clusters analysis, L. Kaufman and P. J. Rousseeuw, John Wiley & Sons

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 31Phases Build phase: an initial clustering is obtained by the successive selection of representative objects until k (number of clusters) objects have been found. Build phase: an initial clustering is obtained by the successive selection of representative objects until k (number of clusters) objects have been found. Swap phase: it is attempted to improve the set of the k representative objects. Swap phase: it is attempted to improve the set of the k representative objects.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 32 Build Phase The first object is the one for which the sum of dissimilarities to all objects is as small as possible. The first object is the one for which the sum of dissimilarities to all objects is as small as possible. This is most centrally located object. This is most centrally located object. At each subsequent step another object that decreases the objective function is selected, for instance average dissimilarity. At each subsequent step another object that decreases the objective function is selected, for instance average dissimilarity.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 33 Build Phase next steps - I Consider an object e i which has not yet been selected. Consider an object e i which has not yet been selected. Consider a nonselected object e j ; calculate the difference C ji, between its dissimilarity D j (d(e n,e j )) with the most similar previously selected object e n, and its dissimilarity d(e i,e j ) with object e i. Consider a nonselected object e j ; calculate the difference C ji, between its dissimilarity D j (d(e n,e j )) with the most similar previously selected object e n, and its dissimilarity d(e i,e j ) with object e i. If C ij = D j - d(e i,e j ) is positive object e j will contribute to select object e i, If C ij = D j - d(e i,e j ) is positive object e j will contribute to select object e i, C ij = max(D j - d(e i,e j ), 0) C ij = max(D j - d(e i,e j ), 0)

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 34 Build Phase next steps - II Calculate the total gain obtained by selecting object e i ; G i = sum j (C ji ) Calculate the total gain obtained by selecting object e i ; G i = sum j (C ji ) Choose the not yet selected object e i which maximizes G i = sum j (C ji ) Choose the not yet selected object e i which maximizes G i = sum j (C ji ) The process continues until k objects have been found. The process continues until k objects have been found.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 35 Swap Phase It is attempted to improve the set of representative elements. It is attempted to improve the set of representative elements. Consider all pairs of elements (i,h) for which e i has been selected and e h has not. Consider all pairs of elements (i,h) for which e i has been selected and e h has not. What is the effect of swapping e i and e h ? What is the effect of swapping e i and e h ? Consider the objective function as the sum of dissimilarities between each element and the most similar representative object. Consider the objective function as the sum of dissimilarities between each element and the most similar representative object.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 36 Swap Phase: possibility a Consider a non selected object e j and calculate its contribution C jih to the swap: Consider a non selected object e j and calculate its contribution C jih to the swap: If e j is more distant from both e i and e h than from one of the other representatives, e.g. e k, so C ijh = 0 If e j is more distant from both e i and e h than from one of the other representatives, e.g. e k, so C ijh = 0 So e j belongs to object e k, sometimes referred as the medoid m k and the swap will not change the quality of the clustering So e j belongs to object e k, sometimes referred as the medoid m k and the swap will not change the quality of the clustering Remember: positive contributions decrease the quality of the clustering. Remember: positive contributions decrease the quality of the clustering.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 37 eheh Swap Phase possibility a Object e j belongs to medoid e k (i<>k). If e i is replaced by e h and e j is still closer to e k, then C jih = 0. Object e j belongs to medoid e k (i<>k). If e i is replaced by e h and e j is still closer to e k, then C jih = 0. ekek eiei ejej

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 38 Swap Phase possibility b.1 If e j is not further from e i than from any one of the other representative (d(e i,e j )=D j ), two situations must be considered: If e j is not further from e i than from any one of the other representative (d(e i,e j )=D j ), two situations must be considered: e j is closer to e h than to the second closest representative e n, d(e j,e h ) < d(e j,e n ) then C jih =d(e j, e h )-d(e j,e i ). e j is closer to e h than to the second closest representative e n, d(e j,e h ) < d(e j,e n ) then C jih =d(e j, e h )-d(e j,e i ). Contribution C jih can either positive or negative. Contribution C jih can either positive or negative. if element e j is closer to e i than to e h the contribution is positive, the swap is not favourable. if element e j is closer to e i than to e h the contribution is positive, the swap is not favourable.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 39 eheh Swap Phase possibility b.1.+ Object e j belongs to medoid e i. If e i is replaced by e h and e j is close to e i than e h the contribution is positive. C jih > 0 Object e j belongs to medoid e i. If e i is replaced by e h and e j is close to e i than e h the contribution is positive. C jih > 0 enen eiei ejej

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 40 eheh Swap Phase possibility b.1.- Object e j belongs to medoid e i. If e i is replaced by e h and e j is not closer to e i than e h the contribution is negative. C jih <0 Object e j belongs to medoid e i. If e i is replaced by e h and e j is not closer to e i than e h the contribution is negative. C jih <0 ekek eiei ejej

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 41 Swap Phase possibility b.2 e j is at least as distant from e h than from the second closest representative d(e j,e h ) >= d(e j,e n ) then C jih = d(e j,e n ) - d(e j,e i ) e j is at least as distant from e h than from the second closest representative d(e j,e h ) >= d(e j,e n ) then C jih = d(e j,e n ) - d(e j,e i ) The contribution is always positive because it is not advantageous to replace e i by an e h further away from e j than from the second best closest representative object. The contribution is always positive because it is not advantageous to replace e i by an e h further away from e j than from the second best closest representative object.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 42 enen Swap Phase possibility b.2 Object e j belongs to medoid e i. If e i is replaced by e h and e j is further from e h than e n, the contribution is always positive. C jih > 0 Object e j belongs to medoid e i. If e i is replaced by e h and e j is further from e h than e n, the contribution is always positive. C jih > 0 eheh eiei ejej

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 43 Swap Phase possibility c e j more distant from e i than from at least one of the other representative objects (e n ) but closer to e h than to any representative object, then C jih = d(e j,e h ) - d(e j,e i ) e j more distant from e i than from at least one of the other representative objects (e n ) but closer to e h than to any representative object, then C jih = d(e j,e h ) - d(e j,e i ) enen eheh eiei ejej

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 44 K-medoids Algorithm 2

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 45 K-medoid algorithm Arbitrarily choose k objects as the initial medoids. Repeat Assign each remaining object to the cluster with the nearest medoid; Randomly select a nonmedoid object, m random ; Compute the total cost J of swapping m j with m random ; If J<0 then swap o j with o random ; Until no change

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 46 Replacing medoids case 1 Object p belongs to medoid m j. If m j is replaced by m random and p is closest to one of m i (i<>j), then reassigns p to m i Object p belongs to medoid m j. If m j is replaced by m random and p is closest to one of m i (i<>j), then reassigns p to m i mimi mjmj m random p

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 47 m random Replacing medoids case 2 Object p belongs to medoid m j. If m j is replaced by m random and p is closest to m random, then reassigns p to m random Object p belongs to medoid m j. If m j is replaced by m random and p is closest to m random, then reassigns p to m random mimi mjmj p

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 48 m random Replacing medoids case 3 Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is still close to m i, then does not change. Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is still close to m i, then does not change. mimi mjmj p

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 49 m random Replacing medoids case 4 Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is closest to m random,then reassigns p to m random. Object p belongs to medoid m i (i<>j). If m j is replaced by m random and p is closest to m random,then reassigns p to m random. mimi mjmj p

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 50Comparisons? K-medoids is more robust than k-means in presence of noise and outliers. K-medoids is more robust than k-means in presence of noise and outliers. K-means is less costly in terms of processing time. K-means is less costly in terms of processing time.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 51 Fuzzy C-means

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 52 Fuzzy C-means Fuzzy version of K-means Fuzzy version of K-means Elements may belong to more than one cluster Elements may belong to more than one cluster Values of characteristic function range from 0 to 1. Values of characteristic function range from 0 to 1. It is interpreted as the degree of membership of an element to a cluster relative to all other clusters. It is interpreted as the degree of membership of an element to a cluster relative to all other clusters.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 53 Fuzzy C-means setup Consider n (X={x 1, x 2,..., x n }) objects and c clusters. Consider n (X={x 1, x 2,..., x n }) objects and c clusters. Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Each object x i is defined by l characteristics x i =(x i1, x i2,..., x il ). Consider A a set of k clusters (A={A 1, A 2,..., A k }). Consider A a set of k clusters (A={A 1, A 2,..., A k }).

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 54 Fuzzy C-means properties The union of all clusters makes the Universe The union of all clusters makes the Universe There is no empty cluster There is no empty cluster

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 55 Membership function

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 56 Membership matrix U Matrix containing the values of inclusion of each element into each cluster [0,1]. Matrix containing the values of inclusion of each element into each cluster [0,1]. Matrix has c (clusters) lines and n (elements) columns. Matrix has c (clusters) lines and n (elements) columns. The sum of all elements in the column must be equal to one. The sum of all elements in the column must be equal to one. The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements. The sum of each line must be less than n e grater than 0. No empty cluster, or cluster containing all elements.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 57 Matrix examples X1X2X3 X4X5X6 Two examples of clustering. What do the clusters represent?

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 58 Fuzzy C-means algorithm v1 Arbitrarily assigns each object to a cluster (matrix U). Repeat Update the cluster centres; Update the cluster centres; Reassign objects to the clusters to which the objects are most similar; Reassign objects to the clusters to which the objects are most similar; Until no change;

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 59 Fuzzy C-means algorithm v2 Arbitrarily choose c objects as the initial cluster centres. Repeat Reassign objects to the clusters to which the objects are most similar. Reassign objects to the clusters to which the objects are most similar. Update the cluster centres. Until no change

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 60 Algorithm details The algorithm tries to minimise the function, m is the nebulisation factor. The algorithm tries to minimise the function, m is the nebulisation factor. d ie is the distance between the element x e (l characteristics) and the centre of the cluster i (v i ) d ie is the distance between the element x e (l characteristics) and the centre of the cluster i (v i )

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 61 Nebulisation factor m is the nebulisation factor. m is the nebulisation factor. This value has a range [1, ) This value has a range [1, ) If m=1 the the system is crisp. If m=1 the the system is crisp. If m the all the membership values tend to 1/c If m the all the membership values tend to 1/c The most common values are 1.25 and 2.0 The most common values are 1.25 and 2.0

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 62 Cluster Centre The centre of the cluster i (v i ) is a l characteristics vector. The centre of the cluster i (v i ) is a l characteristics vector. The jth co-ordinate is calculated as The jth co-ordinate is calculated as

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 63 Detailed Algorithm Choose c (number of clusters). Choose c (number of clusters). Set error ( > 0), nebulisation factor (m) and step (r=0). Set error ( > 0), nebulisation factor (m) and step (r=0). Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements. Arbitrarily set matrix U (r). Do not forget, each element belongs to a single cluster, no empty cluster and no cluster has all elements.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 64 Detailed Algorithm cont. Repeat Calculate the centre of the clusters v i (r) Calculate the distance d i (r) of each point to the centre of the clusters Generate U (r+1) recalculating all characteristic functions(How?) Until ||U (r+1) -U (r) || < Until ||U (r+1) -U (r) || <

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 65 How to recalculate? If there is any distance greater than zero then membership grade is the weighted average of the distances to all centers. If there is any distance greater than zero then membership grade is the weighted average of the distances to all centers. else the element belongs to this cluster and no one else.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 66 Example of clustering result

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 67 Example of clustering result

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 68 Possibilistic Clustering

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 69 Membership function

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 70 Membership function The membership degree is the representativity of typicality of the datum x for the cluster i. The membership degree is the representativity of typicality of the datum x for the cluster i.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 71 Algorithm details The algorithm tries to minimise the function, m is the nebulisation factor. The algorithm tries to minimise the function, m is the nebulisation factor. The first sum is the usual and the second rewards high memberships. The first sum is the usual and the second rewards high memberships.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 72 How to calculate?

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 73 Detailed Algorithm Choose c (number of clusters) and m. Choose c (number of clusters) and m. Set error ( > 0) and step (r=0). Set error ( > 0) and step (r=0). Execute FCM Execute FCM

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 74 Detailed Algorithm cont. For 2 times Initialize U (0) and the centre of the clusters v i (0) with previous results Initialize k and r=0 Repeat Calculate the distance d i (r) of each point to the centre of the clusters Generate U (r+1) recalculating all characteristic functions using the equations Until ||U (r+1) -U (r) || < Until ||U (r+1) -U (r) || < End FOR

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 75 Gustafson-Kessel Algorithm

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 76 Gustafson-Kessel method This method (GK) is fuzzy clustering method similar to the Fuzzy C-means (FCM). This method (GK) is fuzzy clustering method similar to the Fuzzy C-means (FCM). The difference is the way the distance is calculated. The difference is the way the distance is calculated. FCM uses Euclidean distances FCM uses Euclidean distances GK uses Mahalanobis distances GK uses Mahalanobis distances

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 77 Gustafson-Kessel method Mahalanobis distance is calculated as Mahalanobis distance is calculated as The matrices A i are given by The matrices A i are given by

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 78 Gustafson-Kessel method The Fuzzy Covariance Matrix is The Fuzzy Covariance Matrix is

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 79 GK comments The clusters are hyperellipsoids on the l. The clusters are hyperellipsoids on the l. The hyperellipsoids have aproximately the same size. The hyperellipsoids have aproximately the same size. In order to be possible to calculate S -1 the number of samples n must be at least equal to the number of dimensions l plus 1. In order to be possible to calculate S -1 the number of samples n must be at least equal to the number of dimensions l plus 1.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 80 Results of GK

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 81 Gath-Geva method It is also known as Gaussian Mixture Decomposition. It is also known as Gaussian Mixture Decomposition. It is similar to the FCM method It is similar to the FCM method The Gauss distance is used instead of Euclidean distance. The Gauss distance is used instead of Euclidean distance. The clusters do not have a definite shape anymore and have various sizes. The clusters do not have a definite shape anymore and have various sizes.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 82 Gath-Geva Method Gauss distance is given by Gauss distance is given by A i =S i -1 A i =S i -1

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 83 Gath-Geva Method The term P i is the probability of a sample belong to a cluster. The term P i is the probability of a sample belong to a cluster.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 84 Gath-Geva Comments Pi is a parameter that influences the size of a cluster. Pi is a parameter that influences the size of a cluster. Bigger clusters attract more elements. Bigger clusters attract more elements. The exponential term makes more difficult to avoid local minima. The exponential term makes more difficult to avoid local minima. Usually another clustering method is used to initialise the partition matrix U. Usually another clustering method is used to initialise the partition matrix U.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 85 GG Results – Random Centers

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 86 GG Results – Centers FCM

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 87 Clustering based on Equivalence Relations A relation crisp R on a universe X can be thought as a relation from X to X A relation crisp R on a universe X can be thought as a relation from X to X R is an equivalence relation if it has the following three properties: R is an equivalence relation if it has the following three properties: Reflexivity (x i, x i ) R Reflexivity (x i, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Transitivity (x i, x j ) R and (x j, x k ) R (x i, x k ) R Transitivity (x i, x j ) R and (x j, x k ) R (x i, x k ) R

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 88 Crisp tolerance relation R is a tolerance relation if it has the following two properties: R is a tolerance relation if it has the following two properties: Reflexivity (x i, x i ) R Reflexivity (x i, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Symmetry (x i, x j ) R (x j, x i ) R

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 89 Composition of Relations XYZ RS T=R°S

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 90 Composition of Crisp Relations The operation ° is similar to matrix multiplication.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 91 Transforming Relations A tolerance relation can be transformed into a equivalence relation by at most (n-1) compositions with itself. A tolerance relation can be transformed into a equivalence relation by at most (n-1) compositions with itself. n is the cardinality of the set R. n is the cardinality of the set R.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 92 Example of crisp classification Let X={1,2,3,4,5,6,7,8,9,10} Let X={1,2,3,4,5,6,7,8,9,10} Let R be defined as the relation for the identical remainder after dividing each element by 3. Let R be defined as the relation for the identical remainder after dividing each element by 3. This relation is an equivalence relation This relation is an equivalence relation

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 93 Relation Matrix

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 94 Crisp Classification Consider equivalent columns. Consider equivalent columns. It is possible to group the elements in the following classes It is possible to group the elements in the following classes R 0 = {3, 6, 9} R 0 = {3, 6, 9} R 1 = {1, 4, 7, 10} R 1 = {1, 4, 7, 10} R 2 = {2, 5, 8} R 2 = {2, 5, 8}

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 95 Clustering and Fuzzy Equivalence Relations A relation fuzzy R on a universe X can be thought as a relation from X to X A relation fuzzy R on a universe X can be thought as a relation from X to X R is an equivalence relation if it has the following three properties: R is an equivalence relation if it has the following three properties: Reflexivity: (x i, x i ) R or (x i, x i ) = 1 Reflexivity: (x i, x i ) R or (x i, x i ) = 1 Symmetry: (x i, x j ) R (x j, x i ) R or Symmetry: (x i, x j ) R (x j, x i ) R or (x i, x j ) = (x j, x i ) (x i, x j ) = (x j, x i ) Transitivity: (x i, x j ) and (x j, x k ) R (x i, x k ) R or if (x i, x j ) = 1 and (x j, x k ) = 2 then (x i, x k ) = and >=min( 1, 2 ) Transitivity: (x i, x j ) and (x j, x k ) R (x i, x k ) R or if (x i, x j ) = 1 and (x j, x k ) = 2 then (x i, x k ) = and >=min( 1, 2 )

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 96 Fuzzy tolerance relation R is a tolerance relation if it has the following two properties: R is a tolerance relation if it has the following two properties: Reflexivity (x i, x i ) R Reflexivity (x i, x i ) R Symmetry (x i, x j ) R (x j, x i ) R Symmetry (x i, x j ) R (x j, x i ) R

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 97 Composition of Fuzzy Relations The operation ° is similar to matrix multiplication.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 98 Distance Relation Let X be a set of data on l. Let X be a set of data on l. The distance function is a tolerance relation that can be transformed into a equivalence. The distance function is a tolerance relation that can be transformed into a equivalence. The relation R can be defined by the Minkowski distance formula. The relation R can be defined by the Minkowski distance formula. is a constant that ensures that R [0,1] and is equal to the inverse of the largest distance in X. is a constant that ensures that R [0,1] and is equal to the inverse of the largest distance in X.

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 99 Example of Fuzzy classification Let X={(0,0),(1,1),(2,3),(3,1),(4,0)} be a set of points in 2. Let X={(0,0),(1,1),(2,3),(3,1),(4,0)} be a set of points in 2. Set q=2, Euclidean distances. Set q=2, Euclidean distances. The largest distance is 4 ( x 1, x 5 ), so =0.25. The largest distance is 4 ( x 1, x 5 ), so =0.25. The relation R can be calculated by the equation The relation R can be calculated by the equation

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 100 Points to be classified

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 101 Tolerance matrix The matrix calculated by the equation is The matrix calculated by the equation is The is a tolerance relation that needs to be transformed into a equivalence relation The is a tolerance relation that needs to be transformed into a equivalence relation

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 102 Equivalence matrix The matrix transformed is The matrix transformed is

*@2006 Adriano Cruz *NCE e IM - UFRJ Cluster 103 Results of clustering Taking -cuts of fuzzy equivalent relation at various values of =0.44, 0.5, 0.65 and 1.0 we get the following classes: Taking -cuts of fuzzy equivalent relation at various values of =0.44, 0.5, 0.65 and 1.0 we get the following classes: R.44 =[{x 1,x 2,x 3,x 4,x 5 }] R.44 =[{x 1,x 2,x 3,x 4,x 5 }] R.55 =[{x 1,x 2,x 4,x 5 }{x 3 }] R.55 =[{x 1,x 2,x 4,x 5 }{x 3 }] R.65 =[{x 1,x 2 },{x 3 },{x 4,x 5 }] R.65 =[{x 1,x 2 },{x 3 },{x 4,x 5 }] R 1.0 =[{x 1 },{x 2 },{x 3 },{x 4 },{x 5 }] R 1.0 =[{x 1 },{x 2 },{x 3 },{x 4 },{x 5 }]