EM Algorithm and its Applications

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Clustering Beyond K-means
K Means Clustering , Nearest Cluster and Gaussian Mixture
Lecture 6 Image Segmentation
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which.
Segmentation Divide the image into segments. Each segment:
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Clustering Color/Intensity
Clustering.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided K-means Clustering (text) EM Clustering (paper) Graph Partitioning (text)
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CS654: Digital Image Analysis
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Gaussian Mixture Model classification of Multi-Color Fluorescence In Situ Hybridization (M-FISH) Images Amin Fazel 2006 Department of Computer Science.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
K-Means Segmentation.
Big Data Infrastructure
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Unsupervised Learning
Region Segmentation Readings: Chapter 10: 10
Machine Learning Lecture 9: Clustering
Classification of unlabeled data:
Statistical Models for Automatic Speech Recognition
Introduction Computer vision is the analysis of digital images
Clustering (3) Center-based algorithms Fuzzy k-means
Latent Variables, Mixture Models and EM
Image Segmentation Techniques
Advanced Artificial Intelligence
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
CSSE463: Image Recognition Day 23
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Seam Carving Project 1a due at midnight tonight.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Pattern Recognition and Machine Learning
Segmentation (continued)
INTRODUCTION TO Machine Learning
Announcements Project 4 questions Evaluations.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
CSSE463: Image Recognition Day 23
Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable to.
CSSE463: Image Recognition Day 23
Radial Basis Functions: Alternative to Back Propagation
Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable to.
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

EM Algorithm and its Applications slides adapted from Yi Li, WU

Outline Introduction of EM K-Means  EM EM Applications Image Segmentation using EM in Clustering Algorithms

Clustering How can we separate unsupervised the data into different classes?

Example

Clustering There are K clusters C1,…, CK with means m1,…, mK. The least-squares error is defined as Out of all possible partitions into K clusters, choose the one that minimizes D. K 2 D =   || xi - mk || . k=1 xi  Ck Why don’t we just do this? If we could, would we get meaningful objects?

Color Clustering by K-means Algorithm Form K-means clusters from a set of n-dimensional vectors 1. Set ic (iteration count) to 1 2. Choose randomly a set of K means m1(1), …, mK(1). 3. For each vector xi, compute distance D(xi,mk(ic)), k=1,…K and assign xi to the cluster Cj with nearest mean. 4. Update the means to get m1(ic),…,mK(ic); increment ic by 1 5. Repeat steps 3 and 4 until Ck(ic) = Ck(ic+1) for all k.

Classification Results K-Means Classifier Classification Results x1C(x1) x2C(x2) … xiC(xi) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classifier (K-Means) Cluster Parameters m1 for C1 m2 for C2 … mk for Ck

K-Means Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classification Results x1C(x1) x2C(x2) … xiC(xi) Cluster Parameters m1 for C1 m2 for C2 … mk for Ck

Classification Results (1) C(x1), C(x2), …, C(xi) Output (Unknown) Input (Known) Initial Guess of Cluster Parameters m1 , m2 , …, mk x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classification Results (1) C(x1), C(x2), …, C(xi) Cluster Parameters(1) m1 , m2 , …, mk Classification Results (2) C(x1), C(x2), …, C(xi) Cluster Parameters(2) m1 , m2 , …, mk   Classification Results (ic) C(x1), C(x2), …, C(xi) Cluster Parameters(ic) m1 , m2 , …, mk  

K-Means (Cont.) Boot Step: Iteration Step: Initialize K clusters: C1, …, CK Each Cluster is represented by its mean mj Iteration Step: Estimate the cluster of each data Re-estimate the cluster parameters xi C(xi)

K-Means Example

K-Means Example

Image Segmentation by K-Means Select a value of K Select a feature vector for every pixel (color, texture, position, or combination of these etc.) Define a similarity measure between feature vectors (Usually Euclidean Distance). Apply K-Means Algorithm. Apply Connected Components Algorithm. Merge any components of size less than some threshold to an adjacent component that is most similar to it. * From Marc Pollefeys COMP 256 2003

Results of K-Means Clustering: I gave each pixel the mean intensity or mean color of its cluster --- this is basically just vector quantizing the image intensities/colors. Notice that there is no requirement that clusters be spatially localized and they’re not. Image Clusters on intensity Clusters on color K-means clustering using intensity alone and color alone * From Marc Pollefeys COMP 256 2003

K means: Challenges Will converge But not to the global minimum of objective function Variations: search for appropriate number of clusters by applying k- means with different k and comparing the results

K-means Variants Different ways to initialize the means Different stopping criteria Dynamic methods for determining the right number of clusters (K) for a given image The EM Algorithm: a probabilistic formulation of K-means

E M algorithm Like K-means with soft assignment. (vs hard assignment) Assign point partly to all clusters based on probability it belongs to each. Compute weighted averages (centers, and covariances)

K-means vs. EM K-means EM Cluster mean mean, variance, Representation and weight Cluster randomly select initialize K Initialization K means Gaussian distributions Expectation assign each point soft-assign each point to closest mean to each distribution Maximization compute means compute new params of current clusters of each distribution

Notation: Normal distribution 1D case N( , ) is a 1D normal (Gaussian) distribution with mean  and standard deviation  (so the variance is 2.

Normal distribution: Multi-dimensional case N(, ) is a multivariate Gaussian distribution with mean  and covariance matrix . What is a covariance matrix? R G B R R2 RG RB G GR G2 GB B BR BG B2 variance(X): X2 =  (xi - )2 (1/N) cov(X,Y) =  (xi - x)(yi - y) (1/N)

K –Means -> EM with GMM : The Intuition (1) Instead of making a “hard” decision on to which class a pixel belongs to we use probability theory and assign pixels to classes probabilistically. Using Bayes rule We want to maximize the Posterior Likelihood Prior Posterior

K-Means -> EM : The Intuition (2) We model the Likelihood as a Gaussian/Normal distribution

K-Means-> EM: The Intuition (3) Instead of modeling the color distribution with one Gaussian, we model it as a weighted sum of Gaussians We want to find the parameters to maximize the correctness of We use the general trick of maximizing the logarithm of the probability function and maximize that. This is called Maximum Likelihood Estimation (MLE) We solve the minimization in 2 steps (E and M)

K-Means  EM with Gaussian Mixture Models (GMM) Boot Step: Initialize K clusters: C1, …, CK Iteration Step: Estimate the (soft) cluster assignment of each data Re-estimate the cluster parameters (j, j) and P(Cj) for each cluster j. Expectation Maximization For each cluster j

Classification Results EM Classifier Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classifier (EM) Cluster Parameters (1,1),p(C1) for C1 (2,2),p(C2) for C2 … (k,k),p(Ck) for Ck

Classification Results EM Classifier (Cont.) Input (Known) Output (Unknown) x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi)

Classification Results Expectation Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) +

Classification Results Maximization Step Input (Known) Input (Estimation) Output x1={r1, g1, b1} x2={r2, g2, b2} … xi={ri, gi, bi} Classification Results p(C1|x1) p(Cj|x2) … p(Cj|xi) Cluster Parameters (1,1), p(C1) for C1 (2,2), p(C2) for C2 … (k,k), p(Ck) for Ck +

EM Algorithm Boot Step: Iteration Step: Initialize K clusters: C1, …, CK Iteration Step: Expectation Step Maximization Step (j, j) and P(Cj) for each cluster j.

EM Demo Video of Image Classification with EM Online Tutorial

EM Applications Blobworld: Image Segmentation Using Expectation-Maximization and its Application to Image Querying

Image Segmentation using EM Step 1: Feature Extraction Step 2: Image Segmentation using EM In the literature often EM is used to segment the image into multiple regions. Usually each region corresponds to a gaussian model In our Project we segment one foreground region from the background. In the training, we crop the foreground pixels and train a Gaussian Mixture Model on those pixels. In the testing, we evaluate the cond. probability of each pixel and we threshold on that value.

Symbols The feature vector for pixel i is called xi. There are going to be K Gaussian models; K is given. The j-th model has a Gaussian distribution with parameters j=(j,j). j's are the weights (which sum to 1) of Gaussians.  is the collection of parameters:  =(1, …, k, 1, …, k)

Initialization Each of the K Gaussians will have parameters j=(j,j), where j is the mean of the j-th Gaussian. j is the covariance matrix of the j-th Gaussian. The covariance matrices are initialized to be the identity matrix. The means can be initialized by finding the average feature vectors in each of K windows in the image; this is a data-driven initialization.

E-Step (referred to as responsibilities)

M-Step

Sample Results for Scene Segmentation

Finding good feature vectors

Fitting Gaussians in RGB space

Fitting Gaussians in other spaces Experiment with other color spaces. Experiment with dimensionality reduction (PCA)