Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and.

Similar presentations


Presentation on theme: "Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and."— Presentation transcript:

1 Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and Olli Virmajoki to be presented at: Data Mining 2003

2 Problem setup Given N data vectors X={x 1, x 2, …, x N }, partition the data set into M clusters 1.Clustering: find the location of the clusters. 2. Vector quantization: approximate the original data by a set of code vectors.

3 Agglomerative clustering PNN: Pairwise Nearest Neigbor method Merges two clusters Preserves hierarchy of clusters IS: Iterative shrinking method Removes one cluster Repartition data vectors in removed cluster

4 Iterative Shrinking

5 Iterative Shrinking algorithm (IS)

6 Local optimization of the IS Finding secondary cluster: Removal cost of single vector:

7 Generalization to the case of unknown number of clusters Measure variance-ratio F-test for every intermediate clustering from M=1..N. Select the clustering with minimum F-ratio as final clustering. No additional computing – except the calculation of the F-ratio.

8 Example for (Data set 3)

9 Example for Data set 4

10 Genetic algorithm Generate S initial solutions. REPEAT T times Select best solutions to survive. Generate new solutions by crossover Fine-tune solutions END-REPEAT Output the best solution found.

11 Illustration of crossover + = Crossover

12 GAIS algorithm

13 Effect of crossover

14 Convergence of GA with F-ratio

15 Image datasets Bridge (256  256) d = 16 N = 4096 M = 256 Miss America (360  288) d = 16 N = 6480 M = 256 House (256  256) d = 3 N = 34112 * M = 256

16 Synthetic data sets Data set S 1 d = 2 N = 5000 M = 15 Data set S 2 d = 2 N = 5000 M = 15 Data set S 3 d = 2 N = 5000 M = 15 Data set S 4 d = 2 N = 5000 M = 15

17 Comparison with image data Popular methods Previous GA NEW! Simplest of the good ones

18 Comparison with synthetic data Most separable clusters Most overlapping between clusters

19 What does it cost? Bridge Random:~0 s K-means:8 s SOM: 6 minutes GA-PNN:13 minutes GAIS – short:~1 hour GAIS – long:~3 days

20 Conclusions Slower but better clustering algorithm. BEST known clustering algorithm in minimizing MSE Thank you!


Download ppt "Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and."

Similar presentations


Ads by Google