Clustering MacKay - Chapter 20.

Slides:



Advertisements
Similar presentations
CS 478 – Tools for Machine Learning and Data Mining Clustering: Distance-based Approaches.
Advertisements

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Clustering II.
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Color/Intensity
Transportation Models Transportation problem is about distribution of goods and services from several supply locations to several demand locations. Transportation.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function Networks
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Medical Imaging Dr. Mohammad Dawood Department of Computer Science University of Münster Germany.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Computer Graphics and Image Processing (CIS-601).
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Flat clustering approaches
Medical Image Analysis Dr. Mohammad Dawood Department of Computer Science University of Münster Germany.
Data Mining – Algorithms: K Means Clustering
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
NAME THAT ALGORITHM #2 HERE ARE SOME PROBLEMS. SOLVE THEM. GL HF.
Unsupervised Learning
INTRODUCTION TO STATISTICS
Semi-Supervised Clustering
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Machine Learning Clustering: K-means Supervised Learning
St. Edward’s University
Hopfield Networks MacKay - Chapter 42.
LECTURE 11: Advanced Discriminant Analysis
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Data Mining K-means Algorithm
Classification of unlabeled data:
Haim Kaplan and Uri Zwick
Classification with Perceptrons Reading:
Map of the Great Divide Basin, Wyoming, created using a neural network and used to find likely fossil beds See:
Principal Component Analysis (PCA)
Introduction to Summary Statistics
Introduction to Summary Statistics
Computer Science cpsc322, Lecture 14
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
University College London (UCL), UK
Fitting Curve Models to Edges
Image Processing for Physical Data
In summary C1={skin} C2={~skin} Given x=[R,G,B], is it skin or ~skin?
Introduction to Summary Statistics
Chapter 4 Linear Programming: The Simplex Method
Introduction to Summary Statistics
Information Organization: Clustering
of the Artificial Neural Networks.
Problem Definition Input: Output: Requirement:
Introduction to Summary Statistics
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Introduction to Summary Statistics
Text Categorization Berlin Chen 2003 Reference:
Chapter 7: Transformations
Introduction to Summary Statistics
Introduction to Summary Statistics
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms
EM Algorithm and its Applications
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Unsupervised Learning
Presentation transcript:

Clustering MacKay - Chapter 20

Machine Learning Unsupervised Learning: No training data is provided No feedback is provided during learning E.g. “Find a common underlying factor in a data set.” “Divide the data into 4 meaningful groups.” Algorithms: k-means clustering Principal Components Analysis (PCA) Independent Components Analysis (ICA) Supervised Learning: Training data is often provided Feedback may be provided during learning E.g. “Learn what A’s and B’s look like and classify a letter in a new font as either an A or a B.” “Learn to balance a pole – pole tips = negative feedback, pole stays balanced = positive feedback.” Algorithms: Perceptron classifier learning Multilayer-perceptron Hopfield networks Reinforcement learning algorithms (e.g. Q-learning) Boltzmann machines

Clustering The problem of assigning data to groups. Ungrouped data 16 16 14 14 12 12 10 10 Ungrouped data Grouped data 8 8 6 6 4 4 2 2 2 4 6 8 10 12 14 16 18 2 4 6 8 10 12 14 16 18

Martin, Fowlkes and Malik (2004) K-Means

K-Means Renninger and Malik (2004)

K-Means Murray, Brunet and Michel (2008)

K-Means Suppose you have a set of data points that you would like to classify into three different classes, but you don’t have labels for any of the points. What can you do? Well, one thing you can try is competitive learning.

K-Means In this case, you start off by randomly picking three means for you data clusters (shown here by black astrices).

K-Means Then, for each mean, you compute the distance from every data point to that mean.

K-Means

K-Means

K-Means Then you assign each data point to the mean it lies closest to. In a sense, the means compete for each data point, but only the closest means that given data point.

K-Means Then re-calculate each mean as the mean of the cluster of data points that is won. The process is then repeated until the means stop moving around.

K-Means The Algorithm Randomly pick K-means. Calculate the distance from each point to each mean. The mean that is closest to a given point wins that point. Re-calculate each mean as the mean of the points it has just won. Iterate until the total distance moved by all of the means is a very small number (like zero).

K-Means % Generate toy data ---------------------------------------- K = 3; % Number of means n = 100; % Number of data points around each mean N = n*K; % Total number of data points dims = 2; % Dimensionality of the data set M = rand(K,dims)*20;% The True Means % Generate normally distributed random data centered around each mean X = randn(N,dims)+repmat(M,[n 1]); % Plot the data figure('Color','w'); plot(X(:,1),X(:,2),'r.'); axis equal; axis square; % Choose a colour for each cluster ColMap = colormap('jet'); Col = ColMap(round(linspace(1,size(ColMap,1),K)),:); 25 20 15 10 5 -5 5 10 15 20

K-Means % K means algorithm ------------------------------------------ % Initialize the K means m = rand(K,dims)*20; % Initialize distances matrix Dists = zeros(N,K); % Initialize the categories matrix Khat = zeros(N,1); KeepGoing = true;

K-Means while KeepGoing % Calculate the distances from the mean to every data % point for each of the k-means. for k = 1:K Dists(:,k) = 1/2 * sum((repmat(m(k,:),[N, 1]) - X).^2, 2); end

K-Means % For each data point determine which mean lies closest. OldKhat = Khat; [Val,Khat] = min(Dists,[],2); % Hard Assignment step - here we're making a % boolean valued matrix with N rows and K columns. % Each row corresponds to an individual data point % and each column corresponds to one of the k-means. % If a point belongs to a given mean then a one will % appear under that mean's column in the point's row. % Otherwise, the matrix value will be zero. r = zeros(N,K); r(sub2ind([N K],(1:N)',Khat)) = 1; R = sum(r);

K-Means % Hard Update step for k = 1:K % For each of the k-means Ind{k} = find(Khat==k); for d = 1:dims % For each data dimension if R(k) % Calculate the updated value of the mean % as the average of the newly classified % points that are closest to that mean. m(k,d) = sum(X(:,d).*r(:,k)) ./ R(k); end

K-Means % Plot the data plot(m(:,1),m(:,2),'k*',M(:,1),M(:,2),'ko'); hold on; for k = 1:K plot(X(Ind{k},1),X(Ind{k},2),'.','Color',Col(k,:)); end hold off; drawnow; % Stop when the means stop changing KeepGoing = any(Khat-OldKhat);

K-Means: Common Problems The final state depends a lot on the initial guesses for the means 20 16 18 16 14 14 12 12 10 10 8 8 6 4 6 2 4 5 10 15 20 8 10 12 14 16 18 20 22

K-Means: Solutions Run the algorithm to convergence several independent times. Take the most common groupings.

K-Means: Solutions On different times we run the algorithm we might end up with different labels for the same groups, even though the means are in roughly the same places. One way to know that the groupings are the same is to adopt the following procedure: For a data set with n points make a matrix of zeros of size n x n. For each pair of data points (i,j) add a 1 in the matrix at row i, column j if the points are members of the same cluster. Over repeated iterations of the k-means algorithm the matrix will have larger numbers where the point pairs tend to consistently be members of the same cluster and low numbers where they don’t. Divide all numbers in the matrix by the total number of times you ran the k-means algorithm so that the maximum is 1 and the minimum is 0 (call this matrix “MeanM”).

K-Means: Solutions 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 Apply a threshold to the numbers in the matrix so that any number above e.g. 0.7 is set to 1 and any number less than 0.7 is set to 0. Create a new n x n matrix where you replace the 1’s in the previous matrix by incrementally increasing numbers (e.g. 1, 2, 3, etc.). If row i, column j is x, then set row j, column i to also be x. 0 0 1 0 0 0 5 0 0 0 2 0 0 0 1 0 0 0 3 0 0 0 2 0 0 0 4 0 0 0 3 0 0 0 6 0 0 0 4 0 0 0 5 0 0 0 6 0 0

K-Means: Solutions 0 0 1 0 0 0 5 0 0 0 2 0 0 0 1 0 0 0 3 0 0 0 2 0 0 0 4 0 0 0 3 0 0 0 6 0 0 0 4 0 0 0 5 0 0 0 6 0 0 Go through each row and column of the matrix n x n times and for each element set the value to be the smallest number in its row and column. This procedure provides a unique set of labels for each cluster. 0 0 1 0 0 0 1 0 0 0 2 0 0 0 1 0 0 0 1 0 0 0 2 0 0 0 2 0

K-Means Solutions Instead of choosing the mean starting locations randomly, you can place them on all possible sets of k data points and apply the procedure just mentioned.

Estimating K So far we have been assuming that K (the number of clusters) is known. What if you don’t know K? Can it be estimated automatically? Rationale: If the chosen value of K matches the true value of K, then the values in the MeansM matrix will consistently have the same groupings of points (all 1’s for the grouped points and 0’s everywhere else). If the chosen value of K does not match the true value of K, then the clusterings will vary more from one iteration to the next and the points classified as grouped together will be more variable and the MeansM matrix will have more non-1 and non-0 values. Tibshirani and Walther (2005)

Estimating K Procedure: Compute MeanM for different values of K. Ground Truth K=1 K=2 K=3 Procedure: Compute MeanM for different values of K. Histogram the resulting values into ten equally spaced bins from 0 to 1 (i.e. 0, 0.1, 0.2, …, 1). Compute the sum of the values in the 0 and 1 bins and subtract the sum of the values in the remaining bins. The K-value with the highest point (other than 1) may be taken as the best K-value for the data set. K=4 K=5 K=6 K=7

Homework Try implementing the “Soft K-Means” algorithm presented in the book (Algorithm 20.7, p. 289) and run it on some toy data (as in our example). Try running the K-means algorithm on the data in Clustering_2_3Data.mat. This is a data set of 1600 hand-drawn numbers 2’s and 3’s that are each 16 x 16 pixels in size (256 pixels total). Run the algorithm with 2 clusters and see what you get out. Hint: to view one image use: imagesc(reshape(X(1,:),[16 16])');