Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Similar presentations


Presentation on theme: "Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering."— Presentation transcript:

1

2 Unsupervised Learning: Clustering Rong Jin

3 Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering

4 Unsupervised vs. Supervised Learning  Supervised learning Training data Every training example is labeled  Unsupervised learning Training data No data is labeled We can still discover the structure of the data  Semi-supervised learning Training data Mixture of labeled and unlabeled data Can you think of ways to utilize the unlabeled data for improving predication?

5 Unsupervised Learning  Clustering  Visualization  Density Estimation  Outlier/Novelty Detection  Data Compression

6 Clustering/Density Estimation $$$ age

7 Clustering for Visualization

8 Image Compression http://www.ece.neu.edu/groups/rpl/kmeans/

9 K-means for Clustering  Key for clustering Find cluster centers Determine appropriate clusters (very very hard)  K-means Start with a random guess of cluster centers Determine the membership of each data points Adjust the cluster centers

10 K-means 1.Ask user how many clusters they’d like. (e.g. k=5)

11 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations

12 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)

13 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns

14 K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns Any Computational Problem? Computational Complexity: O(N) where N is the number of points?

15 Improve K-means  Group points by region  KD tree  SR tree  Key difference Find the closest center for each rectangle Assign all the points within a rectangle to one cluster

16 Improved K-means Find the closest center for each rectangle Assign all the points within a rectangle to one cluster

17 Improved K-means

18

19

20

21

22

23

24

25

26 Gaussian Mixture Model for Clustering  Assume that data are generated from a mixture of Gaussian distributions  For each Gaussian distribution Center:  i Variance:  i (ignore)  For each data point Determine membership

27 Learning a Gaussian Mixture (with known covariance)  Probability  Log-likelihood of unlabeled data  Find optimal parameters

28 Learning a Gaussian Mixture (with known covariance) E-Step M-Step

29 Gaussian Mixture Example: Start

30 After First Iteration

31 After 2nd Iteration

32 After 3rd Iteration

33 After 4th Iteration

34 After 5th Iteration

35 After 6th Iteration

36 After 20th Iteration


Download ppt "Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering."

Similar presentations


Ads by Google