Download presentation

Presentation is loading. Please wait.

Published byKayla Rodriguez Modified over 3 years ago

1
K-Means Clustering Algorithm Mining Lab. 2004 10 27

2
Content Clustering K-Means via EM

3
Clustering (1/2) Clustering ? Clustering algorithms divide a data set into natural groups (clusters). Instances in the same cluster are similar to each other, they share certain properties. e.g Customer Segmentation. Clustering vs. Classification Supervised Learning Unsupervised Learning Not target variable to be predicted.

4
Clustering (2/2) Categorization of Clustering Methods Partitioning mehtods K-Means / K-medoids / PAM / CRARA / CRARANS Hierachical methods CURE / CHAMELON / BIRCH Density-based methods DBSCAN / OPTICS Grid-based methods STING / CLIQUE / Wave-Cluster Model-based methods EM / COBWEB / Bayesian / Neural Model-Based Clustering Statistical Clustering Probability-based Clustering

5
K-Means (1) Algorithm Step 0 : Select K objects as initial centroids. Step 1 : (Assignment) For each object compute distances to k centroids. Assign each object to the cluster to which it is the closest. Step 2 : (New Centroids) Compute a new centroid for each cluster. Step 3: (Converage) Stop if the change in the centroids is less than the selected covergence criterion. Otherwise repeat Step 1.

6
K-Means (2) simple example Random Centroids Assignment New Centroids & (Check) Assignment New Centroids & (check) AssignmentCentroids & (check) Input Data

7
K-Means (3) weakness on outlier (noise)

8
K-Means (4) Calculation 0. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) 1. 1) 2) - (3, 4), (4, 4), (4, 2) - (0, 2) (1, 1), (1, 0) 2. 2) 3) - (3, 4), (4, 4), (4, 2) - (0, 2) (1, 1), (1, 0) 1. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) (100, 0) 1. 1) 2) - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2) - (100,1) 2. 1) 2) - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2) - (100, 1)

9
K-Means (5) comparison with EM K-Means Hard Clustering. A instance belong to only one Cluster. Based on Euclidean distance. Not Robust on outlier, value range. EM Soft Clustering. A instance belong to several clusters with membership probability. Based on density probability. Can handle both numeric and nominal attributes. I C1 C2 I C1 C2 0.7 0.3

Similar presentations

OK

Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)

Clustering Clustering is a technique for finding similarity groups in data, called clusters. I.e., it groups data instances that are similar to (near)

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on history of computer generations Tough times never last but tough people do ppt online Ppt online downloader video Ppt on power sharing in democracy your vote Maths ppt on surface area and volume Ppt on types of trees Ppt on verbs for grade 5 Ppt on 7 wonders of the world 2012 Ppt on kinetic energy and potential energy Ppt on digital media broadcasting system