Download presentation

Presentation is loading. Please wait.

1
K-MEANS ALGORITHM Jelena Vukovic 53/07 jeca.zr@gmail.com

2
Introduction Basic idea of k-means algorithm Detailed explenation Most common problems of the algorithm Applications Possible improvements Elektrotehnički fakultet u Beogradu 2/16

3
Bassic principles of algorithm Elektrotehnički fakultet u Beogradu 3/16 Given the set of points (x 1, x 2, …, x n ) Partition n points into k sets (n>k) (S 1, S 2, …, S k ) The goal is to minimize within-cluster sum of squares µ i is the mean of points in S i

4
The algorithm Initialize the number of means (k) Iterate: 1. Assign each point to the nearest mean 2. Move mean to center of its cluster Elektrotehnički fakultet u Beogradu 4/16

5
The algorithm Elektrotehnički fakultet u Beogradu 5/16 Assign points to nearest mean Move means

6
The algorithm The complexity is O(n * k * I * d) n – number of points k – number of clusters I – number of iterations d – number of attributes Elektrotehnički fakultet u Beogradu 6/16 Re-assign points

7
The algorithm Elektrotehnički fakultet u Beogradu 7/16

8
K nearest neighbors Very similar algorithm The decision is made based on the simple majority of the closest k neighbors In k-means the Euclidian distant measure is used Elektrotehnički fakultet u Beogradu 8/16

9
Some limitations of algorithm The number of clusters needs to be known in advance Initialization of means position Problems appear when clusters have different Shapes Sizes Density Elektrotehnički fakultet u Beogradu 9/16

10
Initial centroids problem Random distribution (the most common) Multiple runs Testing on a data sample Analyze the data Elektrotehnički fakultet u Beogradu 10/16

11
Different density Elektrotehnički fakultet u Beogradu 11/16 Original points3 Clusters

12
Non-globular shapes Elektrotehnički fakultet u Beogradu 12/16 Original points2 Clusters

13
Pros and cons Pros Simple to implement Fast Not highly demanding Cons K needs to be known Ellipsoid shape is assumed Requires some knowledge about data in advance Possibility of many loop turns, without significant changes in clusters Elektrotehnički fakultet u Beogradu 13/16

14
Applications of the algorithm Many different uses Computer vision Market segmentation Geostatic Astronomy etc Elektrotehnički fakultet u Beogradu 14/16

15
Improvements Pre-processing of the data in order to better estimate k Run multiple iteration in parallel with different centroid initialization Ignore possible errors to avoid non-standard cluster shapes Elektrotehnički fakultet u Beogradu 15/16

16
Thank you! Elektrotehnički fakultet u Beogradu 16/16

Similar presentations

OK

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on self development activities Ppt on multi level marketing Ppt on internal auditing process flowchart Ppt on relays and circuit breakers Ppt on condition based maintenance army Ppt on martin luther king for kids Ppt on leadership quotes Ppt on carry select adder Ppt on power distribution in india Ppt on indian history quiz