Presentation is loading. Please wait.

Presentation is loading. Please wait.

A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao.

Similar presentations


Presentation on theme: "A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao."— Presentation transcript:

1 A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao

2 Outline Motivation Objective Introduction The basic concept of the genetic strategy The genetic clustering algorithm The heuristic to find a good clustering Conclusion Personal Opinion

3 Motivation Some clustering algorithms require the user to provide the number of clusters as input It is not easy for the user to guess how many clusters should be there. The user in general has no idea about the number of clusters. The clustering result may be no good Especially when the number of clusters is large and not easy to guess

4 Objective Propose a genetic clustering algorithm Will automatically search for a proper number Classify the objects into these clusters

5 Introduction The clustering methods Hierarchical The agglomerative methods The divisive methods Non-Hierarchical The K-means algorithm Is an iterative hill-climbing algorithm the solution obtained depends on the initial clustering

6 The basic concept of the genetic strategy

7 The genetic clustering algorithm The algorithm CLUSTERING consists of two stages The nearest-neighbor algorithm. To group those data that are close to one another. To reduce the size of the data to a moderate one that is suitable for the genetic clustering algorithm. Genetic clustering algorithm. To group the small clusters into larger cluster. A heuristic strategy is then used to find a good clustering.

8 The nearest-neighbor algorithm. The distance Base on the average of the nearest-neighbor distances Steps 1. For each object O i, find the distance between O i and its nearest neighbor.

9 The nearest-neighbor algorithm Steps 2. Compute d av, the average of the nearest-neighbor distance by using step 1 3. View the n objects as nodes of a graph. Compute the adjacency matrix A n*n

10 The nearest-neighbor algorithm Steps 4. Find the connected components of this graph. The data sets represented by these connected components be denoted by B 1, B 2, …, B m The center of each set be denoted by V i, 1 ≤ i ≤ m

11 The genetic algorithm Initialization step Iterative generations Reproduction phase Crossover phase Mutation phase

12 The genetic algorithm Initialization step A population of N strings is randomly generated The length of each string is m m is the number of the sets obtained in the first stage. If B i is in this subset, the ith position of the string will be 1; otherwise, it will be 0 Each B i in the subset is used as a seed to generate a cluster.

13 The genetic algorithm

14 How to generate a set of clusters from the seeds Let T = {T 1, T 2,…, T s } be the subset corresponding to a string. The initial clusters C i ’s are T i ’s and initial centers S i ’s of clusters are V i ’s for i = 1, 2,…,s. The size of cluster Ci is ‌ C i ‌ = ‌ T i ‌ for i = 1, 2,…,s, where ‌ T i ‌ denotes the number of objects belonging to T i

15 The genetic algorithm The B i ’s in {B 1, B 2, …, B m } – T are taken one by one and the distance between the center V i of the taken B i. the center S j of each cluster C j is calculated If B i is classified as in the cluster C j, the center S j and the size of the cluster C j will be recomputed

16 The genetic algorithm Reproduction phase The intra-distance in the center C i The inter-distance between this cluster Ci and the set of all other clusters. The fitness function of a string R

17 The genetic algorithm Crossover phase Two random number p and q in [1, m] are generated to decide which pieces of the string are to be interchanged. The crossover operator is done with probability p c Mutation Phase Each chosen bit will be changed from 0 to 1 or from 1 to 0.

18 The heuristic strategy to find a good clustering D 1 (w) estimates the closeness of the clusters in the clustering D 2 (w) estimates the compactness of the clusters in the clustering

19 The heuristic strategy to find a good clustering The value of w’s are chosen from [w 1, w 2 ] by some kind of binary search To finds the greatest jump on the values of D 1 (w)’s and the greatest jump on the values of D 2 (w)’s. Based on these jumps, it then decides which a good clustering is

20 Experiments The population size is 50 The crossover rate is 80 % The mutation rate is 5 % [w 1, w 2 ] = [1, 3] w 1 is the smallest value, w 2 is the largest value Three sets of data were used

21 Fig. (a) The first set of data consists of three groups of points on the plane. The densities of three groups are not the same Fig. (b), (c) K-mean algorithm Fig. (d) Complete-link method Fig. (e) Single-link method

22 Fig. (a) The original data set with five groups of points Fig. (b), (c) and (d) K-mean algorithm Fig. (e) By CLUSTERING, complete-link, single- link and K-mean

23

24

25 Conclusion and Personal Opinion The experimental results show that CLUSTERING is effective. Can automatically search for a proper number as the number of clusters.


Download ppt "A genetic approach to the automatic clustering problem Author : Lin Yu Tseng Shiueng Bien Yang Graduate : Chien-Ming Hsiao."

Similar presentations


Ads by Google