Download presentation

Presentation is loading. Please wait.

1
K-MEANS ALGORITHM Jelena Vukovic 53/07

2
Introduction Basic idea of k-means algorithm Detailed explenation Most common problems of the algorithm Applications Possible improvements Elektrotehnički fakultet u Beogradu 2/16

3
Bassic principles of algorithm Elektrotehnički fakultet u Beogradu 3/16 Given the set of points (x 1, x 2, …, x n ) Partition n points into k sets (n>k) (S 1, S 2, …, S k ) The goal is to minimize within-cluster sum of squares µ i is the mean of points in S i

4
The algorithm Initialize the number of means (k) Iterate: 1. Assign each point to the nearest mean 2. Move mean to center of its cluster Elektrotehnički fakultet u Beogradu 4/16

5
The algorithm Elektrotehnički fakultet u Beogradu 5/16 Assign points to nearest mean Move means

6
The algorithm The complexity is O(n * k * I * d) n – number of points k – number of clusters I – number of iterations d – number of attributes Elektrotehnički fakultet u Beogradu 6/16 Re-assign points

7
The algorithm Elektrotehnički fakultet u Beogradu 7/16

8
K nearest neighbors Very similar algorithm The decision is made based on the simple majority of the closest k neighbors In k-means the Euclidian distant measure is used Elektrotehnički fakultet u Beogradu 8/16

9
Some limitations of algorithm The number of clusters needs to be known in advance Initialization of means position Problems appear when clusters have different Shapes Sizes Density Elektrotehnički fakultet u Beogradu 9/16

10
Initial centroids problem Random distribution (the most common) Multiple runs Testing on a data sample Analyze the data Elektrotehnički fakultet u Beogradu 10/16

11
Different density Elektrotehnički fakultet u Beogradu 11/16 Original points3 Clusters

12
Non-globular shapes Elektrotehnički fakultet u Beogradu 12/16 Original points2 Clusters

13
Pros and cons Pros Simple to implement Fast Not highly demanding Cons K needs to be known Ellipsoid shape is assumed Requires some knowledge about data in advance Possibility of many loop turns, without significant changes in clusters Elektrotehnički fakultet u Beogradu 13/16

14
Applications of the algorithm Many different uses Computer vision Market segmentation Geostatic Astronomy etc Elektrotehnički fakultet u Beogradu 14/16

15
Improvements Pre-processing of the data in order to better estimate k Run multiple iteration in parallel with different centroid initialization Ignore possible errors to avoid non-standard cluster shapes Elektrotehnički fakultet u Beogradu 15/16

16
Thank you! Elektrotehnički fakultet u Beogradu 16/16

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google