EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Mixture Models and the EM Algorithm
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Clustering Beyond K-means
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Segmentation and Fitting Using Probabilistic Methods
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
EE 290A: Generalized Principal Component Analysis Lecture 5: Generalized Principal Component Analysis Sastry & Yang © Spring, 2011EE 290A, University of.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Lecture 5: Learning models using EM
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Gaussian Mixture Example: Start After First Iteration.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation-Maximization
Visual Recognition Tutorial
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
EM Algorithm Likelihood, Mixture Models and Clustering.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EE 290A: Generalized Principal Component Analysis Lecture 2 (by Allen Y. Yang): Extensions of PCA Sastry & Yang © Spring, 2011EE 290A, University of California,
EM and expected complete log-likelihood Mixture of Experts
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
HMM - Part 2 The EM algorithm Continuous density HMM.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
EE4-62 MLCV Lecture Face Recognition – Subspace/Manifold Learning Tae-Kyun Kim 1 EE4-62 MLCV.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prototype Classification Methods Fu Chang Institute of Information Science Academia Sinica ext. 1819
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Parameter Learning with Hidden Variables & Expectation Maximization.
Lecture 2: Statistical learning primer for biologists
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
For multivariate data of a continuous nature, attention has focussed on the use of multivariate normal components because of their computational convenience.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Classification of unlabeled data:
Segmentation of Dynamic Scenes from Image Intensities
Latent Variables, Mixture Models and EM
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
SMEM Algorithm for Mixture Models
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University of California, Berkeley1

Last time PCA reduces dimensionality of a data set while retaining as much as possible the data variation.  Statistical view: The leading PCs are given by the leading eigenvectors of the covariance.  Geometric view: Fitting a d-dim subspace model via SVD Extensions of PCA  Probabilistic PCA via MLE  Kernel PCA via kernel functions and kernel matrices Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 2

This lecture Review basic iterative algorithms Formulation of the subspace segmentation problem Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 3

Example 4.1 Euclidean distance-based clustering is not invariant to linear transformation Distance metric needs to be adjusted after linear transformation Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 4

Assume data sampled from a mixture of Gaussian Classical distance metric between a sample and the mean of the jth cluster is the Mahanalobis distance Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 5

K-Means Assume a map function provide each ith sample a label An optimal clustering minimizes the within-cluster scatter: i.e., the average distance of all samples to their respective cluster means Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 6

However, as K is user defined, when each point becomes a cluster itself: K=n. In this chapter, would assume true K is known. Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 7

Algorithm A chicken-and-egg view Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 8

Two-Step Iteration Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 9

Example Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 10

Characteristics of K-Means It is a greedy algorithm, does not guarantee to converge to the global optimum. Given fixed initial clusters/ Gaussian models, the iterative process is deterministic. Result may be improved by running k-means multiple times with different starting conditions. The segmentation-estimation process can be treated as a generalized expectation-maximization algorithm Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 11

EM Algorithm [Dempster-Laird-Rubin 1977] EM estimates the model parameters and the segmentation in a ML sense. Assume samples are independently drawn from a mixed probabilistic distribution, indicated by a hidden discrete variable z Cond. dist. can be Gaussian Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 12

The Maximum-Likelihood Estimation The unknown parameters are The likelihood function: The optimal solution maximizes the log-likelihood Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 13

E Step: Compute the Expectation Directly maximize the log-likelihood function is a high-dimensional nonlinear optimization problem Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 14

Define a new function: The first term is called expected complete log- likelihood function; The second term is the conditional entropy. Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 15

Observation: Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 16

M-Step: Maximization Regard the (incomplete) log-likelihood as a function of two variables: Maximize g iteratively Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 17

Iteration converges to a stationary point Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 18

Prop 4.2: Update Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 19

Update Recall Assume is fixed, then maximize the expected complete log-likelihood Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 20

To maximize the expected log-likelihood, as an example, assume each cluster is isotropic normal distribution: Eliminate the constant term in the objective Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 21

Exer 4.2 Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 22 Compared to k-means, EM assigns the samples “softly” to each cluster according to a set of probabilities.

EM Algorithm Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 23

Exam 4.3: Global max may not exist Sastry & Yang © Spring, 2011 EE 290A, University of California, Berkeley 24