Download presentation

Presentation is loading. Please wait.

Published byKayla Rodriguez Modified over 2 years ago

1
K-Means Clustering Algorithm Mining Lab

2
Content Clustering K-Means via EM

3
Clustering (1/2) Clustering ? Clustering algorithms divide a data set into natural groups (clusters). Instances in the same cluster are similar to each other, they share certain properties. e.g Customer Segmentation. Clustering vs. Classification Supervised Learning Unsupervised Learning Not target variable to be predicted.

4
Clustering (2/2) Categorization of Clustering Methods Partitioning mehtods K-Means / K-medoids / PAM / CRARA / CRARANS Hierachical methods CURE / CHAMELON / BIRCH Density-based methods DBSCAN / OPTICS Grid-based methods STING / CLIQUE / Wave-Cluster Model-based methods EM / COBWEB / Bayesian / Neural Model-Based Clustering Statistical Clustering Probability-based Clustering

5
K-Means (1) Algorithm Step 0 : Select K objects as initial centroids. Step 1 : (Assignment) For each object compute distances to k centroids. Assign each object to the cluster to which it is the closest. Step 2 : (New Centroids) Compute a new centroid for each cluster. Step 3: (Converage) Stop if the change in the centroids is less than the selected covergence criterion. Otherwise repeat Step 1.

6
K-Means (2) simple example Random Centroids Assignment New Centroids & (Check) Assignment New Centroids & (check) AssignmentCentroids & (check) Input Data

7
K-Means (3) weakness on outlier (noise)

8
K-Means (4) Calculation 0. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) 1. 1) 2) - (3, 4), (4, 4), (4, 2) - (0, 2) (1, 1), (1, 0) 2. 2) 3) - (3, 4), (4, 4), (4, 2) - (0, 2) (1, 1), (1, 0) 1. (4,4), (3,4) (4,2), (0,2), (1,1), (1,0) (100, 0) 1. 1) 2) - (0,2), (1,1), (1,0),(3,4),(4,4),(4,2) - (100,1) 2. 1) 2) - (0, 2),(1,1),(1,0),(3,4),(4,4),(4,2) - (100, 1)

9
K-Means (5) comparison with EM K-Means Hard Clustering. A instance belong to only one Cluster. Based on Euclidean distance. Not Robust on outlier, value range. EM Soft Clustering. A instance belong to several clusters with membership probability. Based on density probability. Can handle both numeric and nominal attributes. I C1 C2 I C1 C

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google