# K-MEANS ALGORITHM Jelena Vukovic 53/07

## Presentation on theme: "K-MEANS ALGORITHM Jelena Vukovic 53/07"— Presentation transcript:

K-MEANS ALGORITHM Jelena Vukovic 53/07 jeca.zr@gmail.com

Introduction Basic idea of k-means algorithm Detailed explenation Most common problems of the algorithm Applications Possible improvements Elektrotehnički fakultet u Beogradu 2/16

Bassic principles of algorithm Elektrotehnički fakultet u Beogradu 3/16 Given the set of points (x 1, x 2, …, x n ) Partition n points into k sets (n>k) (S 1, S 2, …, S k ) The goal is to minimize within-cluster sum of squares µ i is the mean of points in S i

The algorithm Initialize the number of means (k) Iterate: 1. Assign each point to the nearest mean 2. Move mean to center of its cluster Elektrotehnički fakultet u Beogradu 4/16

The algorithm Elektrotehnički fakultet u Beogradu 5/16 Assign points to nearest mean Move means

The algorithm The complexity is O(n * k * I * d) n – number of points k – number of clusters I – number of iterations d – number of attributes Elektrotehnički fakultet u Beogradu 6/16 Re-assign points

The algorithm Elektrotehnički fakultet u Beogradu 7/16

K nearest neighbors Very similar algorithm The decision is made based on the simple majority of the closest k neighbors In k-means the Euclidian distant measure is used Elektrotehnički fakultet u Beogradu 8/16

Some limitations of algorithm The number of clusters needs to be known in advance Initialization of means position Problems appear when clusters have different Shapes Sizes Density Elektrotehnički fakultet u Beogradu 9/16

Initial centroids problem Random distribution (the most common) Multiple runs Testing on a data sample Analyze the data Elektrotehnički fakultet u Beogradu 10/16

Different density Elektrotehnički fakultet u Beogradu 11/16 Original points3 Clusters

Non-globular shapes Elektrotehnički fakultet u Beogradu 12/16 Original points2 Clusters

Pros and cons Pros Simple to implement Fast Not highly demanding Cons K needs to be known Ellipsoid shape is assumed Requires some knowledge about data in advance Possibility of many loop turns, without significant changes in clusters Elektrotehnički fakultet u Beogradu 13/16

Applications of the algorithm Many different uses Computer vision Market segmentation Geostatic Astronomy etc Elektrotehnički fakultet u Beogradu 14/16

Improvements Pre-processing of the data in order to better estimate k Run multiple iteration in parallel with different centroid initialization Ignore possible errors to avoid non-standard cluster shapes Elektrotehnički fakultet u Beogradu 15/16

Thank you! Elektrotehnički fakultet u Beogradu 16/16