Lecture 18 Expectation Maximization

Slides:



Advertisements
Similar presentations
K-means and Gaussian Mixture Model 王养浩 2013 年 11 月 20 日.
Advertisements

Image Modeling & Segmentation
Mixture Models and the EM Algorithm
Unsupervised Learning
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Supervised Learning Recap
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Visual Recognition Tutorial
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Expectation Maximization Algorithm
Expectation-Maximization
Visual Recognition Tutorial
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EE462 MLCV 1 Lecture 3-4 Clustering (1hr) Gaussian Mixture and EM (1hr) Tae-Kyun Kim.
EM Algorithm Likelihood, Mixture Models and Clustering.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Gaussian Mixture Models and Expectation Maximization.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 9, Friday June 15 th, 2007 (EM.
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Lecture 19: More EM Machine Learning April 15, 2010.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Lecture 17 Gaussian Mixture Models and Expectation Maximization
HMM - Part 2 The EM algorithm Continuous density HMM.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 24
1 Clustering: K-Means Machine Learning , Fall 2014 Bhavana Dalvi Mishra PhD student LTI, CMU Slides are based on materials from Prof. Eric Xing,
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Deep Feedforward Networks
Expectation-Maximization (EM)
Statistical Models for Automatic Speech Recognition
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Multimodal Learning with Deep Boltzmann Machines
CS 2750: Machine Learning Expectation Maximization
Latent Variables, Mixture Models and EM
Expectation-Maximization
Hidden Markov Models Part 2: Algorithms
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Statistical Models for Automatic Speech Recognition
SMEM Algorithm for Mixture Models
Expectation Maximization
Gaussian Mixture Models And their training with the EM algorithm
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
10701 Recitation Pengtao Xie
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Biointelligence Laboratory, Seoul National University
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
EM Algorithm 主講人:虞台文.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Lecture 18 Expectation Maximization Machine Learning

Last Time Expectation Maximization Gaussian Mixture Models

Term Project Projects may use existing machine learning software weka, libsvm, liblinear, mallet, crf++, etc. But must experiment with Type of data Feature Representations a variety of training styles – amount of data, classifiers. Evaluation

Gaussian Mixture Model Mixture Models. How can we combine many probability density functions to fit a more complicated distribution?

Gaussian Mixture Model Fitting Multimodal Data Clustering

Gaussian Mixture Model Expectation Maximization. E-step Assign points. M-step Re-estimate model parameters.

Today EM Proof Clustering sequential data Jensen’s Inequality EM over HMMs

Gaussian Mixture Models

How can we be sure GMM/EM works? We’ve already seen that there are multiple clustering solutions for the same data. Non-convex optimization problem Can we prove that we’re approaching some maximum, even if many exist.

Bound maximization Since we can’t optimize the GMM parameters directly, maybe we can find the maximum of a lower bound. Technically: optimize a convex lower bound of the initial non-convex function.

EM as a bound maximization problem Need to define a function Q(x,Θ) such that Q(x,Θ) ≤ l(x,Θ) for all x,Θ Q(x,Θ) = l(x,Θ) at a single point Q(x,Θ) is concave

EM as bound maximization Claim: for GMM likelihood The GMM MLE estimate is a convex lower bound

EM Correctness Proof Prove that l(x,Θ) ≥ Q(x,Θ) Likelihood function Introduce hidden variable (mixtures in GMM) A fixed value of θt Jensen’s Inequality (coming soon…)

EM Correctness Proof GMM Maximum Likelihood Estimation

The missing link: Jensen’s Inequality If f is concave (or convex down): Incredibly important tool for dealing with mixture models. if f(x) = log(x)

Generalizing EM from GMM Notice, the EM optimization proof never introduced the exact form of the GMM Only the introduction of a hidden variable, z. Thus, we can generalize the form of EM to broader types of latent variable models

General form of EM Given a joint distribution over observed and latent variables: Want to maximize: Initialize parameters E Step: Evaluate: M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood) Check for convergence of params or likelihood

Applying EM to Graphical Models Now we have a general form for learning parameters for latent variables. Take a Guess Expectation: Evaluate likelihood Maximization: Reestimate parameters Check for convergence

Clustering over sequential data HMMs What if you believe the data is sequential, but you can’t observe the state.

Training latent variables in Graphical Models Now consider a general Graphical Model with latent variables.

EM on Latent Variable Models Guess Easy, just assign random values to parameters E-Step: Evaluate likelihood. We can use JTA to evaluate the likelihood. And marginalize expected parameter values M-Step: Re-estimate parameters. Based on the form of the models generate new expected parameters (CPTs or parameters of continuous distributions) Depending on the topology this can be slow

Break