Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and.

Slides:



Advertisements
Similar presentations
Variable Metric For Binary Vector Quantization UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Ismo Kärkkäinen and Pasi Fränti.
Advertisements

Random Swap EM algorithm for GMM and Image Segmentation
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Exact and heuristics algorithms
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Today Unsupervised Learning Clustering K-means. EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms Ali Al-Shahib.
Basic Data Mining Techniques Chapter Decision Trees.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Basic Data Mining Techniques
Unsupervised Learning and Data Mining
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Evaluating Performance for Data Mining Techniques
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Computer Vision James Hays, Brown
Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Unsupervised Learning and Clustering k-means clustering Sum-of-Squared Errors Competitive Learning SOM Pre-processing and Post-processing techniques.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel genetic algorithm for automatic clustering Advisor.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSIE Dept., National Taiwan Univ., Taiwan
Chapter 8 The k-Means Algorithm and Genetic Algorithm.
Applying Genetic Algorithm to the Knapsack Problem Qi Su ECE 539 Spring 2001 Course Project.
Efficient algorithms for polygonal approximation
Bahman Bahmani Stanford University
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
Reference line approach in vector data compression Alexander Akimov, Alexander Kolesnikov and Pasi Fränti UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER.
DATA CLUSTERING WITH KERNAL K-MEANS++ PROJECT OBJECTIVES o PROJECT GOAL  Experimentally demonstrate the application of Kernel K-Means to non-linearly.
Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.
Presentation: Genetic clustering of social networks using random walks ELSEVIER Computational Statistics & Data Analysis February 2007 Genetic clustering.
Machine Learning Queens College Lecture 7: Clustering.
Slide 1 EE3J2 Data Mining Lecture 18 K-means and Agglomerative Algorithms.
Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.
Cluster Analysis Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 28 Nov 9, 2005 Nanjing University of Science & Technology.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Multilevel thresholding by fast PNN based algorithm UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Olli Virmajoki and Pasi Fränti.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Corresponding Clustering: An Approach to Cluster Multiple Related Spatial Datasets Vadeerat Rinsurongkawong and Christoph F. Eick Department of Computer.
Color Image Segmentation Mentor : Dr. Rajeev Srivastava Students: Achit Kumar Ojha Aseem Kumar Akshay Tyagi.
How to cluster data Algorithm review Extra material for DAA Prof. Pasi Fränti Speech & Image Processing Unit School of Computing University.
Genetic Algorithms for clustering problem Pasi Fränti
Genetic Algorithm(GA)
Hirophysics.com The Genetic Algorithm vs. Simulated Annealing Charles Barnes PHY 327.
Evolutionary Computation Evolving Neural Network Topologies.
Agglomerative clustering (AC)
Using GA’s to Solve Problems
Semi-Supervised Clustering
Centroid index Cluster level quality measure
Divide-and-Conquer MST
Random Swap algorithm Pasi Fränti
AIM: Clustering the Data together
Random Swap algorithm Pasi Fränti
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Aiman H. El-Maleh Sadiq M. Sait Syed Z. Shazli
Density-Based Image Vector Quantization Using a Genetic Algorithm
Pasi Fränti and Sami Sieranoja
Presentation transcript:

Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and Olli Virmajoki to be presented at: Data Mining 2003

Problem setup Given N data vectors X={x 1, x 2, …, x N }, partition the data set into M clusters 1.Clustering: find the location of the clusters. 2. Vector quantization: approximate the original data by a set of code vectors.

Agglomerative clustering PNN: Pairwise Nearest Neigbor method Merges two clusters Preserves hierarchy of clusters IS: Iterative shrinking method Removes one cluster Repartition data vectors in removed cluster

Iterative Shrinking

Iterative Shrinking algorithm (IS)

Local optimization of the IS Finding secondary cluster: Removal cost of single vector:

Generalization to the case of unknown number of clusters Measure variance-ratio F-test for every intermediate clustering from M=1..N. Select the clustering with minimum F-ratio as final clustering. No additional computing – except the calculation of the F-ratio.

Example for (Data set 3)

Example for Data set 4

Genetic algorithm Generate S initial solutions. REPEAT T times Select best solutions to survive. Generate new solutions by crossover Fine-tune solutions END-REPEAT Output the best solution found.

Illustration of crossover + = Crossover

GAIS algorithm

Effect of crossover

Convergence of GA with F-ratio

Image datasets Bridge (256  256) d = 16 N = 4096 M = 256 Miss America (360  288) d = 16 N = 6480 M = 256 House (256  256) d = 3 N = * M = 256

Synthetic data sets Data set S 1 d = 2 N = 5000 M = 15 Data set S 2 d = 2 N = 5000 M = 15 Data set S 3 d = 2 N = 5000 M = 15 Data set S 4 d = 2 N = 5000 M = 15

Comparison with image data Popular methods Previous GA NEW! Simplest of the good ones

Comparison with synthetic data Most separable clusters Most overlapping between clusters

What does it cost? Bridge Random:~0 s K-means:8 s SOM: 6 minutes GA-PNN:13 minutes GAIS – short:~1 hour GAIS – long:~3 days

Conclusions Slower but better clustering algorithm. BEST known clustering algorithm in minimizing MSE Thank you!