Random Swap algorithm Pasi Fränti 24.4.2018.

Slides:



Advertisements
Similar presentations
Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Garfield AP Computer Science
PARTITIONAL CLUSTERING
Algorithmic Complexity Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
CSE 589 Applied Algorithms Spring 1999 Image Compression Vector Quantization Nearest Neighbor Search.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
FLANN Fast Library for Approximate Nearest Neighbors
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Clustering Methods: Part 2d Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND Swap-based.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Self-organizing map Speech and Image Processing Unit Department of Computer Science University of Joensuu, FINLAND Pasi Fränti Clustering Methods: Part.
Cut-based & divisive clustering Clustering algorithms: Part 2b Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Randomized Algorithms Pasi Fränti Treasure island Treasure worth awaits 5000 DAA expedition 5000 ? ? Map for sale: 3000.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Genetic Algorithm Using Iterative Shrinking for Solving Clustering Problems UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Pasi Fränti and.
Fast search methods Pasi Fränti Clustering methods: Part 5 Speech and Image Processing Unit School of Computing University of Eastern Finland
FAST DYNAMIC QUANTIZATION ALGORITHM FOR VECTOR MAP COMPRESSION Minjie Chen, Mantao Xu and Pasi Fränti University of Eastern Finland.
CSC 211 Data Structures Lecture 13
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
Vector Quantization Vector quantization is used in many applications such as image and voice compression, voice recognition (in general statistical pattern.
Vector Quantization CAP5015 Fall 2005.
A new clustering tool of Data Mining RAPID MINER.
Searching Topics Sequential Search Binary Search.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Multilevel thresholding by fast PNN based algorithm UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND Olli Virmajoki and Pasi Fränti.
Iterative K-Means Algorithm Based on Fisher Discriminant UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE JOENSUU, FINLAND Mantao Xu to be presented.
1 Cluster Analysis – 2 Approaches K-Means (traditional) Latent Class Analysis (new) by Jay Magidson, Statistical Innovations based in part on a presentation.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
How to cluster data Algorithm review Extra material for DAA Prof. Pasi Fränti Speech & Image Processing Unit School of Computing University.
K-MEANS CLUSTERING. INTRODUCTION- What is clustering? Clustering is the classification of objects into different groups, or more precisely, the partitioning.
Genetic Algorithms for clustering problem Pasi Fränti
Searching and Sorting Searching algorithms with simple arrays
Agglomerative clustering (AC)
Machine Learning University of Eastern Finland
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Centroid index Cluster level quality measure
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Random Swap algorithm Pasi Fränti
COSC160: Data Structures Linked Lists
Machine Learning University of Eastern Finland
Subject Name: Operation Research Subject Code: 10CS661 Prepared By:Mrs
K-means and Hierarchical Clustering
AIM: Clustering the Data together
KAIST CS LAB Oh Jong-Hoon
K-means properties Pasi Fränti
Dimension reduction : PCA and Clustering
Boltzmann Machine (BM) (§6.4)
Randomized Algorithms
Pasi Fränti and Sami Sieranoja
Artificial Intelligence CIS 342
Md. Tanveer Anwar University of Arkansas
Mean-shift outlier detection
Dimensionally distributed Pasi Fränti and Sami Sieranoja
Simulated Annealing & Boltzmann Machines
Presentation transcript:

Random Swap algorithm Pasi Fränti 24.4.2018

Definitions and data Set of N data points: Partition of the data: X={x1, x2, …, xN} Partition of the data: P={p1, p2, …, pk}, Set of k cluster prototypes (centroids): C={c1, c2, …, ck},

Clustering problem Objective function: Optimality of partition: Optimality of centroid:

K-means algorithm K-Means(X, C) → (C, P) REPEAT Cprev ← C; X = Data set C = Cluster centroids P = Partition K-Means(X, C) → (C, P) REPEAT Cprev ← C; FOR i=1 TO N DO pi ← FindNearest(xi, C); FOR j=1 TO k DO cj ← Average of xi  pi = j; UNTIL C = Cprev Optimal partition Optimal centoids

Problems of k-means

Swapping strategy

Pigeon hole principle CI = Centroid index: P. Fränti, M. Rezaei and Q. Zhao "Centroid index: cluster level similarity measure” Pattern Recognition, 47 (9), 3034-3045, September 2014, 2014.

Pigeon hole principle S2

Aim of the swap S2

Random Swap algorithm

Steps of the swap 1. Random swap: 2. Re-allocate vectors from old cluster: 3. Create new cluster:

Swap

Local re-partition Re-allocate vectors Create new cluster

Iterate by k-means 1st iteration

Iterate by k-means 2nd iteration

Iterate by k-means 3rd iteration

Iterate by k-means 16th iteration

Iterate by k-means 17th iteration

Iterate by k-means 18th iteration

Iterate by k-means 19th iteration

Final result 25 iterations

Extreme example

Dependency on initial solution

Data sets

Data sets

Data sets visualized Images Bridge House Miss America Europe 4x4 blocks RGB color 4x4 blocks frame differential Differential coordinates 16-d 3-d 16-d 2-d

Data sets visualized Artificial

Data sets visualized Artificial G2-2-30 G2-2-50 G2-2-70 A1 A2 A3

Time complexity

Efficiency of the random swap Total time to find correct clustering: Time per iteration  Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) K-means: IkN = O(IkN) Bottleneck!

Efficiency of the random swap Total time to find correct clustering: Time per iteration  Number of iterations Time complexity of single iteration: Swap: O(1) Remove cluster: 2kN/k = O(N) Add cluster: 2N = O(N) Centroids: 2N/k + 2N/k + 2 = O(N/k) (Fast) K-means: 4N = O(N) 2 iterations only! T. Kaukoranta, P. Fränti and O. Nevalainen "A fast exact GLA based on code vector activity detection" IEEE Trans. on Image Processing, 9 (8), 1337-1342, August 2000.

Estimated and observed steps Bridge N=4096, k=256, N/k=16, 8

Processing time profile

Processing time profile

Effect of K-means iterations 1 5 2 3 4 Bridge

Effect of K-means iterations 1 3 2 5 4 Birch2

How many swaps?

Three types of swaps Trial swap Accepted swap Successful swap MSE improves CI improves Before swap Accepted Successful CI=2 CI=2 CI=1 20.12 20.09 15.87

Accepted and successful swaps

Number of swaps needed Example with 35 clusters CI=4 CI=9

Number of swaps needed Example from image quantization

Statistical observations N=5000, k=15, d=2, N/k=333, 4.1

Statistical observations N=5000, k=15, d=2, N/k=333, 4.8

Theoretical estimation

Probability of good swap Select a proper prototype to remove: There are k clusters in total: premoval=1/k Select a proper new location: There are N choices: padd=1/N Only k are significantly different: padd=1/k Both happens same time: k2 significantly different swaps. Probability of each different swap is pswap=1/k2 Open question: how many of these are good? p = (/k)(/k) = O(/k)2

Expected number of iterations Probability of not finding good swap: Estimated number of iterations:

Probability of failure (q) depending on T

Observed probability (%) of fail N=5000, k=15, d=2, N/k=333, 4.5

Observed probability (%) of fail N=1024, k=16, d=32-128, N/k=64, 1.1

Bounds for the iterations Upper limit: Lower limit similarly; resulting in:

Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations:

Expected time complexity Linear dependency on N Quadratic dependency on k (With large number of clusters, it can be too slow) Logarithmic dependency on w (Close to constant) Inverse dependency on  (Higher the dimensionality, faster the method)

Linear dependency on N N<100.000, k=100, d=2, N/k=1000, 3.1

Quadratic dependency on k N<100.000, k=100, d=2, N/k=1000, 3.1

Logarithmic dependency on w

Theory vs. reality

Neighborhood size

S1 How much is  ? Voronoi neighbors Neighbors by distance k 2-dim: 2(3k-6)/k = 6 – 12/k k Upper limit: D-dim: 2kD/2/k = O(2k D/2-1)

Observed number of neighbors Data set S2

Estimate  Five iterations of random swap clustering Each pair of prototypes A and B: 1. Calculate the half point HP = (A+B)/2 2. Find the nearest prototype C for HP 3. If C=A or C=B they are potential neighbors. Analyze potential neighbors: 1. Calculate all vector distances across A and B 2. Select the nearest pair (a,b) 3. If d(a,b) < min( d(a,C(a), d(b,C(b) ) then Accept  = Number of pairs found / k

Observed values of 

Optimality

Multiple optima (=plateaus) Very similar result (<0.3% diff. in MSE) CI-values significantly high (9%) Finds one of the near-optimal solutions

Experiments

Time-versus-distortion N=4096, k=256, d=16, N/k=16, 5.4

Time-versus-distortion N=6480, k=256, d=16, N/k=25, 17.1

Time-versus-distortion N=100.000, k=100, d=2, N/k=1000, 5.8

Time-versus-distortion N=100.000, k=100, d=2, N/k=1000, 3.1

Time-versus-distortion N=169.673, k=256, d=2, N/k=663, 6.3

Time-versus-distortion N=6500, k=8, d=2, N/k=821, 2.3

Variation of results

Variation of results

Comparison of algorithms k‑means (KM) k‑means++ repeated k‑means (RKM) x-means agglomerative clustering (AC) random swap (RS) global k-means genetic algorithm D. Arthur and S. Vassilvitskii, "K-means++: the advantages of careful seeding", ACM-SIAM Symp. on Discrete Algorithms (SODA’07), New Orleans, LA, 1027-1035, January, 2007. D. Pelleg, and A. Moore, "X-means: Extending k-means with efficient estimation of the number of clusters", Int. Conf. on Machine Learning, (ICML’00), Stanford, CA, USA, June 2000. P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), 773-777, May 2000. A. Likas, N. Vlassis and J.J. Verbeek, "The global k-means clustering algorithm", Pattern Recognition 36, 451-461, 2003. P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pat. Rec. Let., 21 (1), 61-68, January 2000.

Processing time

Clustering quality

Conclusions

What we learned? Random swap is efficient algorithm It does not converge to sub-optimal result Expected processing has dependency: Linear O(N) dependency on the size of data Quadratic O(k2) on the number of clusters Inverse O(1/) on the neighborhood size Logarithmic O(log w) on the number of swaps

References P. Fränti, "Efficiency of random swap clustering", Journal of Big Data, 5:13, 1-29, 2018. P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000. P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139‑1148, August 1998. P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008. Pseudo code: http://cs.uef.fi/pages/franti/research/rs.txt

Supporting material Implementations available: (C, Matlab, Java, Javascript, R and Python) http://www.uef.fi/web/machine-learning/software Interactive animation: http://cs.uef.fi/sipu/animator/ Clusterator: http://cs.uef.fi/sipu/clusterator