Download presentation

Presentation is loading. Please wait.

Published byKylee Moxham Modified over 3 years ago

1
Data Set used

2
K Means

3
K Means Clusters 1.K Means begins with a user specified amount of clusters 2.Randomly places the K centroids on the data set 3.Finds all the points closest to each centroid and makes them clusters 4.Changes the centroid of each cluster to the mean of the subset of points 5.Repeats step 5 until the change of the centroids is minimal.

4
Kmeans Implementation Issues If K is too small the algorithm did not converge (no stable clusters) – Further investigation of this is needed If K is too small, some clusters were null

5
K- Means Matlab code

6
Ease of Doing Business vs Paying Taxes

7
Interesting case The border points are clearly defined by distance not density We ask for each point “What is the closest centroid?”

8
Why we like it It is relatively straight forward in concept and implementation Good for globular data We can specify the amount of clusters

9
Why we don’t like it Subject to initialization problems and heterogeneous results. Not good for non-globular data (but can find clusters given a large enough K) Sensitive to outliers (cleaning data set helps) Data must have the notion of a “center”

10
Variations Bisecting K-means K-median K - medoid Several others

11
DBSCAN Algo Pick a point P, find distance of every next point P' from P. If(Dist < K Factor) P' is in same cluster as P. else if (Dist = K Factor) P' is a border point. else Allot P' a new cluster.

12
SNAPSHOTS For K_Factor = 20

13
For K_Factor = 10

14
For K_Factor = 120

15
Calculation of K-Factor

16
Issues faced When adding a new point P' to the present cluster, the whole cluster of P' has to be merged with the present cluster. No lower bound on number of clusters. Choice of K Factor

17
Further Enhancements Calculation for K-Factor and clustering could be integrated together. Dynamic programming could be made use of since many computations are being repeated. Static vs Dynamic data

Similar presentations

OK

Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.

Cluster Analysis CS240B Lecture notes based on those by © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on uti mutual fund Ppt on burj khalifa download Ppt on body language in communication Ppt on fire management Ppt on use of computer in animation Ppt on conservation of natural resources Ppt on db2 mainframes vs servers By appt only movie site Ppt on balanced diet and nutrition Ppt on sight words