Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Authors : Gautam Garai B.B. Chaudhuri Department of Information Management Pattern Recognition Letters 25 (2004) 173-187

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Outline Motivation Objective Introduction Basic concept of Classical Genetic Algorithm Clustering with Genetic Algorithm Experimental results Discussion and conclusion Personal opinions Review

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Some problems of the clustering. Automatic clustering. If one cluster is confined fully or partly within another cluster. If clusters are present in noisy data.

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objective A new genetically guided algorithm for solving the clustering problem, which have two-phase process. Cluster Decomposition Algorithm (CDA). Hierarchical Cluster Merging Algorithm (HCMA). Adjacent Cluster Checking Algorithm (ACCA).

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction These clustering methods can broadly be classified into two categories: Hierarchical agglomerative divisive Non-hierarchical k-means

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Some researchers have used GA based on split-and- merge method in defining clusters. Tseng and Yang (2001). Other algorithms: DBScan CURE Chameleon

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Introduction Genetically based Clustering Algorithm (GCA) which is basically a two-stage split-and-merge algorithm for finding the clusters. Splitting of clusters with CDA. Cluster merging with HCMA. Adjacency checking between two fragmented clusters with ACCA.

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Basic concept of Classical Genetic Algorithm Encoding schemas Fitness evaluation Testing the end of the algorithm Parent selection Crossover operators Mutation operators NO Halt YES

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Clustering with Genetic Algorithm n vectors X = {x 1, x 2, …, x n } to be clustered into k groups. The clustering approach has two steps Cluster Decomposition Algorithm (CDA). Hierarchical Cluster Merging Algorithm (HCMA).

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Splitting of clusters with CDA First decomposes the entire data set into m groups of clusters.

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. The progress of the CDA process Step 1. For each object x i, find the nearest neighbor x j. Step 2. Compute d av.

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. The progress of the CDA process Step 3. Consider x i as the center of a circular region with radius r. Step 4. Set p = 1. Step 5. Extract B p and modify the data set X such that X = |X - B p |. Step 6. Terminate the algorithm if. Otherwise, p = p + 1 and go to step 5.

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster merging with HCMA The second stage to merge the fragmented clusters, B i. 1010110010 PiPi m 0111100101 … u

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster merging with HCMA The algorithm, HCMA consists of all three phases of CGA. P a and P b are chosen randomly from the pool of individuals. Corssover probability,, using single point corssover operation. Adaptive mutation probability.

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster merging with HCMA (example) 0111100101 123479 0568 B1B1 m0m0 pipi Merge until B 0 is null B0B0 m’ 1616 2828 3535 4 7070 9 CiCi

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster merging with HCMA Let the seed of the fragmented cluster B i be. The center S j of each C j : The fitness function,.

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Adjacency checking between two fragmented clusters The ACCA is used along with HCMA if One cluster is confined fully or partly within another cluster. Clusters are present in noisy data. The ACCA uses two thresholds for deciding merging of pair of clusters. : The threshold of boundary points. : The threshold of data density difference.

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. The progress of the ACCA process Step 1. Define suitably the value of the radius. Step 2. Select two fragmented clusters,, which satisfy the merging condition. Step 3. Count the number of boundary points of which resides within radius r’. Let it be N b and the object density of be. Step 4. If then are adjacent to each other. Step 5. Terminate the algorithm.

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experimental results Parameter setting Population size, = 50. The number of clusters, m, is inversely proportional to the value of r. 2 <= u <= 4. k is pre-specified by the user. Crossover probability Initial mutation probability G max =100 times in each cycle, 30 runs. T b =4 ; T d =0.4

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster partitioning in R 2 feature space

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster partitioning in R 2 feature space

22 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster partitioning in R 2 feature space The noise is represented as the third cluster.

23 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Cluster separation in Iris data 4-D Iris dataset.

24 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Discussion and conclusion GCA is composed of two algorithms CDA HCMA After several GA cycles when k clusters are found. Identify clusters accurately (ACCA) Either partly or fully enclosed by another cluster. Noise.

25 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Personal Opinions It may be used in SOM 2-D map to automatic clustering.

26 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Review Using GCA to automatic clustering. Split : CDA Merge : HCMA + ACCA


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor."

Similar presentations


Ads by Google