Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clustering and Dimensionality Reduction Brendan and Yifang April 21 2015.

Similar presentations


Presentation on theme: "Clustering and Dimensionality Reduction Brendan and Yifang April 21 2015."— Presentation transcript:

1 Clustering and Dimensionality Reduction Brendan and Yifang April 21 2015

2 Pre-knowledge We define a set A, and we find the element that minimizes the error We can think of as a sample of Where is the point in C closest to X.

3 Clustering methods K-means clustering Hierarchical clustering Agglomerative clustering Divisive clustering Level set clustering Modal clustering

4 K-partition clustering In a K-partition problem, our goal is to find k points: We define So C are cluster centers. We partition the space into k sets, where

5 K-partition clustering, cont’d Given the data set, our goal is to find Where

6 K-means clustering 1. Choose k centers at random from the data. 2. Form the clusters where is the closest center to 3. Let denotes the number of points in a partition 4. Repeat Step 2 until convergence.

7 Circular data Vs Spherical data Question: Why K-means clustering is good for spherical data?(Grace)

8 Question: What is relationship between K-means and Naïve Bayes? Answer: They have the followings in common: 1. Both of them estimate a probability density function. 2. Assign the closest category/label to the target point (Naïve Bayes), assign the closest centroid to the target point (K-means) They are different in these aspects: 1. Naïve bayes is supervised algorithm, K-means is an unsupervised method. 2. K-means is optimization task, and it is an iterative process, but Naïve bayes is not. 3. K-means is like to a multiple runs of Naïve Bayes, and in each run, the labels are adaptively adjusted.

9 Question: Why K-means does not work well for Figure 35.6? Why spectral clustering helps with it? (Grace) Answer: Special clustering maps data points in R d to data points in R k. Circle-shaped data points in R d will be spherical- shape in R k. But special clustering involves matrix decomposition, is rather time-consuming.

10 Agglomerative clustering Requires pairwise distance among clusters. There are three commonly employed distance. Single linkage Complete linkage (Max Linkage) Average linkage 1.Start with each point in a separate cluster 2.Merge the two closest clusters. 3.Go back to step 1

11 An Example, Single Linkage Question: Is the Figure 35.6 to illustrate when one type of linkage is better than another? (Brad)

12 An example, Complete Linkage

13 Divisive clustering Starts with one large cluster and then recursively divide the larger clusters into small clusters, with any feasible clustering algorithms. A divisive algorithm example 1.Build a Minimum Spanning Tree 2.Create a new clustering by removing a link corresponding to the largest distance 3.Go back to Step 2

14 Level set clustering For a fixed non-negative number, define the level set We decompose into a collection of bounded, connected disjoined sets:

15 Level Set Clustering, Cont’d Estimate density distribution function: Using KDE Decide : fix a small number Decide :

16 Cuevas, Fraiman Algorithm Set j = 0 and I = { 1, …, n } 1. Choose a point from X i, and call this point X 1,. Find the nearest point to X 1 and call this point X 2. Let r 1 = || X 1 - X 2 || 2. r 1 > 2e: set j <- j + 1, remove i from I, and go to Step 1. 3. If r 1 <= 2e, let X 3 be closest to the set X 1, X 2, and let Min{ || X 3 - X 1 ||, || X 3 - X 2 || }. 4. If r 2 > 2e: set j 2e. Then Set j <- j+1 and go back to Step 1.

17 An example Question: Can you give an example to illustrate Level set clustering? (Tavish) 2 1 5 4 3 7 8 12 9 10 6 11

18 Modal Clustering A point x belongs to T j if and only if the steepest ascent path beginning at x leads to m j. Finally the data are clustered to its closest mode. However, p may not have a finite number of modes. A refinement is introduced. P h is a smoothed out version of p using a Gaussian kernel.

19 Mean shift algorithm 1. Choose a number of points x 1, …, x N. Set t = 0. 2. Let t = t + 1. For j = 1,…,N set 3. Repeat until convergence.

20 Question: Can you point out the differences and similarities between different clustering algorithms?(Tavish) Can you compare the pros and cons of the clustering algorithms, and what are suitable situations for each of them?(Sicong) What is the relationship between the clustering algorithms? What assumptions do they have?(Yuankai) Answer: K-means Pros: 1. Simple, very intuitive. Applicable to almost any scenario, any dataset. 2. Fast algorithm Cons: 1. does not work for density-varying data

21 Contour matters K-means Cons: 2. Does not work well when data groups present special contours

22 K-means Cons: 3. Does not work well on outliers 4. Requires K

23 Hierarchical clustering Pros: 1. Its clustering result has the clusters at any level granularity. Any number of clusters could be achieved by cutting the dendrogram at a corresponding level. 2. It provides a dendrogram, which could be visualized hierarchical tree. 3. Does not requires a specified K. Cons: 1. Slower than K-means 2. Hard to decide where to cut off the dendrogram

24 Level set clustering Pros: 1. Work wells when data group presents special contours, e.g., circle. 2. Handle outliers well, because we get a density function. 3. Handle varying density well. Cons: 1. Even slower than hierarchical clustering. KDE is n 2, and Cuevas and Fraiman algorithm is also n 2.

25 Question: Does K-means clustering guarantee convergence? (Jiyun) Answer: Yes. Its time complexity upper bound is O(n 4 ) Question: In Cuevas-fraiman algorithm, does the choice of the start vertex matter? (Jiyun) Answer: The choice of start vertex does no matter. Question: Does not the choice of X j in Mean shit algorithm matter? Answer: No. The X j converges to the modes in the iterative process. The initial value does not matter.

26 Dimension Reduction

27 Motivation

28 Question – Dimension Reduction Benefits Dimensionality reduction aims at reducing the number of random variables in the data before processing. However, this seems counterintuitive as it can reduce distinct features in the data set leading to poor results in succeeding steps. So, how does it help? - Tavish Implicit assumption is that our data contains more features than are useful/necessary (ie highly correlated or purely noise) Common in big data Common when data is naively recorded Reducing the number of dimensions produces a more compact representation and helps with the curse of dimensionality Some methods (ie manifold-based) avoid loss

29 Principal Component Analysis (PCA)

30 Question – Linear Subspace In Principal Component Analysis, it projects the data onto linear subspaces. Could you explain a bit about what is a linear subspace? - Yuankai A linear subspace is a vector space subset of a higher dimensional vector subspace

31 Example – Linear Subspace

32 Example – Subspace projection

33 PCA Objective

34 PCA Algorithm

35 Question – Choosing a Good Dimension

36

37 Example – PCA: d=2, k=1

38 Multidimensional Scaling

39 Example – Multidimensional Scaling

40 Kernel PCA

41 The Kernel Trick

42 Kernel PCA Algorithm

43 Local Linear Embedding (LLE)

44 Question - Manifolds I wanted to know what exactly "manifold" referred to. – Brad “A manifold is a topological space that is locally Euclidean” – Wolfram i.e. the Earth appears flat on a human scale, but we know it’s roughly spherical Maps are useful because they preserve all the surface features despite being a projection

45 Example – Manifolds

46 LLE Algorithm

47 LLE Objective

48 Example – LLE Toy Examples

49 Isomap Similar to LLE in its preservation of the original structure Provides a “manifold” representation of the higher dimensional data Assesses object similarity differently (distance, as a metric, is computed using graph path length) Constructs the low-dimensional mapping differently (uses metric multi-dimensional scaling)

50 Isomap Algorithm

51 Laplacian Eigenmaps

52 Estimating the Manifold Dimension

53 Estimating the Manifold Dimension Cont.

54 Principal Curves and Manifolds

55 Principal Curves and Manifolds Cont.

56 Random Projection

57 Question – Making a Random Projection

58 Question – Distance Randomization


Download ppt "Clustering and Dimensionality Reduction Brendan and Yifang April 21 2015."

Similar presentations


Ads by Google