Clustering MacKay - Chapter 20.

Clustering MacKay - Chapter 20

Machine Learning Unsupervised Learning: No training data is provided
No feedback is provided during learning E.g. “Find a common underlying factor in a data set.” “Divide the data into 4 meaningful groups.” Algorithms: k-means clustering Principal Components Analysis (PCA) Independent Components Analysis (ICA) Supervised Learning: Training data is often provided Feedback may be provided during learning E.g. “Learn what A’s and B’s look like and classify a letter in a new font as either an A or a B.” “Learn to balance a pole – pole tips = negative feedback, pole stays balanced = positive feedback.” Algorithms: Perceptron classifier learning Multilayer-perceptron Hopfield networks Reinforcement learning algorithms (e.g. Q-learning) Boltzmann machines

Clustering The problem of assigning data to groups. Ungrouped data
16 16 14 14 12 12 10 10 Ungrouped data Grouped data 8 8 6 6 4 4 2 2 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18

Martin, Fowlkes and Malik (2004)
K-Means

K-Means Renninger and Malik (2004)

K-Means Murray, Brunet and Michel (2008)

K-Means Suppose you have a set of data points that you would like to classify into three different classes, but you don’t have labels for any of the points. What can you do? Well, one thing you can try is competitive learning.

K-Means In this case, you start off by randomly picking three means for you data clusters (shown here by black astrices).

K-Means Then, for each mean, you compute the distance from every data point to that mean.

K-Means

K-Means Then you assign each data point to the mean it lies closest to. In a sense, the means compete for each data point, but only the closest means that given data point.

K-Means Then re-calculate each mean as the mean of the cluster of data points that is won. The process is then repeated until the means stop moving around.

K-Means The Algorithm Randomly pick K-means.
Calculate the distance from each point to each mean. The mean that is closest to a given point wins that point. Re-calculate each mean as the mean of the points it has just won. Iterate until the total distance moved by all of the means is a very small number (like zero).

K-Means % Generate toy data K = 3; % Number of means n = 100; % Number of data points around each mean N = n*K; % Total number of data points dims = 2; % Dimensionality of the data set M = rand(K,dims)*20;% The True Means % Generate normally distributed random data centered around each mean X = randn(N,dims)+repmat(M,[n 1]); % Plot the data figure('Color','w'); plot(X(:,1),X(:,2),'r.'); axis equal; axis square; % Choose a colour for each cluster ColMap = colormap('jet'); Col = ColMap(round(linspace(1,size(ColMap,1),K)),:); 25 20 15 10 5 -5 5 10 15 20

K-Means % K means algorithm % Initialize the K means m = rand(K,dims)*20; % Initialize distances matrix Dists = zeros(N,K); % Initialize the categories matrix Khat = zeros(N,1); KeepGoing = true;

K-Means while KeepGoing % Calculate the distances from the mean to every data % point for each of the k-means. for k = 1:K Dists(:,k) = 1/2 * sum((repmat(m(k,:),[N, 1]) - X).^2, 2); end

K-Means % For each data point determine which mean lies closest. OldKhat = Khat; [Val,Khat] = min(Dists,[],2); % Hard Assignment step - here we're making a % boolean valued matrix with N rows and K columns. % Each row corresponds to an individual data point % and each column corresponds to one of the k-means. % If a point belongs to a given mean then a one will % appear under that mean's column in the point's row. % Otherwise, the matrix value will be zero. r = zeros(N,K); r(sub2ind([N K],(1:N)',Khat)) = 1; R = sum(r);

K-Means % Hard Update step for k = 1:K % For each of the k-means Ind{k} = find(Khat==k); for d = 1:dims % For each data dimension if R(k) % Calculate the updated value of the mean % as the average of the newly classified % points that are closest to that mean. m(k,d) = sum(X(:,d).*r(:,k)) ./ R(k); end

K-Means % Plot the data plot(m(:,1),m(:,2),'k*',M(:,1),M(:,2),'ko'); hold on; for k = 1:K plot(X(Ind{k},1),X(Ind{k},2),'.','Color',Col(k,:)); end hold off; drawnow; % Stop when the means stop changing KeepGoing = any(Khat-OldKhat);

K-Means: Common Problems
The final state depends a lot on the initial guesses for the means 20 16 18 16 14 14 12 12 10 10 8 8 6 4 6 2 4 5 10 15 20 8 10 12 14 16 18 20 22

K-Means: Solutions Run the algorithm to convergence several independent times. Take the most common groupings.

K-Means: Solutions On different times we run the algorithm we might end up with different labels for the same groups, even though the means are in roughly the same places. One way to know that the groupings are the same is to adopt the following procedure: For a data set with n points make a matrix of zeros of size n x n. For each pair of data points (i,j) add a 1 in the matrix at row i, column j if the points are members of the same cluster. Over repeated iterations of the k-means algorithm the matrix will have larger numbers where the point pairs tend to consistently be members of the same cluster and low numbers where they don’t. Divide all numbers in the matrix by the total number of times you ran the k-means algorithm so that the maximum is 1 and the minimum is 0 (call this matrix “MeanM”).

K-Means: Solutions Apply a threshold to the numbers in the matrix so that any number above e.g. 0.7 is set to 1 and any number less than 0.7 is set to 0. Create a new n x n matrix where you replace the 1’s in the previous matrix by incrementally increasing numbers (e.g. 1, 2, 3, etc.). If row i, column j is x, then set row j, column i to also be x.

K-Means: Solutions Go through each row and column of the matrix n x n times and for each element set the value to be the smallest number in its row and column. This procedure provides a unique set of labels for each cluster.

K-Means Solutions Instead of choosing the mean starting locations randomly, you can place them on all possible sets of k data points and apply the procedure just mentioned.

Estimating K So far we have been assuming that K (the number of clusters) is known. What if you don’t know K? Can it be estimated automatically? Rationale: If the chosen value of K matches the true value of K, then the values in the MeansM matrix will consistently have the same groupings of points (all 1’s for the grouped points and 0’s everywhere else). If the chosen value of K does not match the true value of K, then the clusterings will vary more from one iteration to the next and the points classified as grouped together will be more variable and the MeansM matrix will have more non-1 and non-0 values. Tibshirani and Walther (2005)

Estimating K Procedure: Compute MeanM for different values of K.
Ground Truth K=1 K=2 K=3 Procedure: Compute MeanM for different values of K. Histogram the resulting values into ten equally spaced bins from 0 to 1 (i.e. 0, 0.1, 0.2, …, 1). Compute the sum of the values in the 0 and 1 bins and subtract the sum of the values in the remaining bins. The K-value with the highest point (other than 1) may be taken as the best K-value for the data set. K=4 K=5 K=6 K=7

Homework Try implementing the “Soft K-Means” algorithm presented in the book (Algorithm 20.7, p. 289) and run it on some toy data (as in our example). Try running the K-means algorithm on the data in Clustering_2_3Data.mat. This is a data set of 1600 hand-drawn numbers 2’s and 3’s that are each 16 x 16 pixels in size (256 pixels total). Run the algorithm with 2 clusters and see what you get out. Hint: to view one image use: imagesc(reshape(X(1,:),[16 16])');

Clustering MacKay - Chapter 20.

Similar presentations

Presentation on theme: "Clustering MacKay - Chapter 20."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clustering MacKay - Chapter 20.

Similar presentations

Presentation on theme: "Clustering MacKay - Chapter 20."— Presentation transcript:

Similar presentations

About project

Feedback