Download presentation

Presentation is loading. Please wait.

2
Unsupervised Learning: Clustering Rong Jin

3
Outline Unsupervised learning K means for clustering Expectation Maximization algorithm for clustering

4
Unsupervised vs. Supervised Learning Supervised learning Training data Every training example is labeled Unsupervised learning Training data No data is labeled We can still discover the structure of the data Semi-supervised learning Training data Mixture of labeled and unlabeled data Can you think of ways to utilize the unlabeled data for improving predication?

5
Unsupervised Learning Clustering Visualization Density Estimation Outlier/Novelty Detection Data Compression

6
Clustering/Density Estimation $$$ age

7
Clustering for Visualization

8
Image Compression http://www.ece.neu.edu/groups/rpl/kmeans/

9
K-means for Clustering Key for clustering Find cluster centers Determine appropriate clusters (very very hard) K-means Start with a random guess of cluster centers Determine the membership of each data points Adjust the cluster centers

10
K-means 1.Ask user how many clusters they’d like. (e.g. k=5)

11
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations

12
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)

13
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns

14
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns Any Computational Problem? Computational Complexity: O(N) where N is the number of points?

15
Improve K-means Group points by region KD tree SR tree Key difference Find the closest center for each rectangle Assign all the points within a rectangle to one cluster

16
Improved K-means Find the closest center for each rectangle Assign all the points within a rectangle to one cluster

17
Improved K-means

26
Gaussian Mixture Model for Clustering Assume that data are generated from a mixture of Gaussian distributions For each Gaussian distribution Center: i Variance: i (ignore) For each data point Determine membership

27
Learning a Gaussian Mixture (with known covariance) Probability Log-likelihood of unlabeled data Find optimal parameters

28
Learning a Gaussian Mixture (with known covariance) E-Step M-Step

29
Gaussian Mixture Example: Start

30
After First Iteration

31
After 2nd Iteration

32
After 3rd Iteration

33
After 4th Iteration

34
After 5th Iteration

35
After 6th Iteration

36
After 20th Iteration

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google