Lecture 19: More EM Machine Learning April 15, 2010.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Expectation Maximization
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Supervised Learning Recap
Information Bottleneck EM School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel Gal Elidan and Nir Friedman.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
K-means clustering Hongning Wang
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Mixture Language Models and EM Algorithm
Visual Recognition Tutorial
EE-148 Expectation Maximization Markus Weber 5/11/99.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
. Learning Hidden Markov Models Tutorial #7 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Expectation Maximization Algorithm
Maximum Likelihood (ML), Expectation Maximization (EM)
Expectation-Maximization
Visual Recognition Tutorial
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
EM Algorithm Likelihood, Mixture Models and Clustering.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Gaussian Mixture Models and Expectation Maximization.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Biointelligence Laboratory, Seoul National University
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 9, Friday June 15 th, 2007 (EM.
EM and expected complete log-likelihood Mixture of Experts
1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Lecture 17 Gaussian Mixture Models and Expectation Maximization
HMM - Part 2 The EM algorithm Continuous density HMM.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2005 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 24
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
CSE 517 Natural Language Processing Winter 2015
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Today's Specials ● Detailed look at Lagrange Multipliers ● Forward-Backward and Viterbi algorithms for HMMs ● Intro to EM as a concept [ Motivation, Insights]
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some.
CSE 446: Expectation Maximization (EM) Winter 2012 Daniel Weld Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Expectation-Maximization (EM)
Lecture 18 Expectation Maximization
LECTURE 10: EXPECTATION MAXIMIZATION (EM)
Latent Variables, Mixture Models and EM
Expectation-Maximization
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Bayesian Models in Machine Learning
Expectation Maximization
Stochastic Optimization Maximization for Latent Variable Models
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Unifying Variational and GBP Learning Parameters of MNs EM for BNs
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Lecture 19: More EM Machine Learning April 15, 2010

Last Time Expectation Maximization Gaussian Mixture Models

Today EM Proof – Jensen’s Inequality Clustering sequential data – EM over HMMs – EM in any Graphical Model Gibbs Sampling

Gaussian Mixture Models

How can we be sure GMM/EM works? We’ve already seen that there are multiple clustering solutions for the same data. – Non-convex optimization problem Can we prove that we’re approaching some maximum, even if many exist.

Bound maximization Since we can’t optimize the GMM parameters directly, maybe we can find the maximum of a lower bound. Technically: optimize a convex lower bound of the initial non-convex function.

EM as a bound maximization problem Need to define a function Q(x,Θ) such that – Q(x,Θ) ≤ l(x,Θ) for all x,Θ – Q(x,Θ) = l(x,Θ) at a single point – Q(x,Θ) is concave

EM as bound maximization Claim: – for GMM likelihood – The GMM MLE estimate is a convex lower bound

EM Correctness Proof Prove that l(x,Θ) ≥ Q(x,Θ) Likelihood function Introduce hidden variable (mixtures in GMM) A fixed value of θ t Jensen’s Inequality (coming soon…)

EM Correctness Proof GMM Maximum Likelihood Estimation

The missing link: Jensen’s Inequality If f is concave (or convex down): Incredibly important tool for dealing with mixture models. if f(x) = log(x)

Generalizing EM from GMM Notice, the EM optimization proof never introduced the exact form of the GMM Only the introduction of a hidden variable, z. Thus, we can generalize the form of EM to broader types of latent variable models

General form of EM Given a joint distribution over observed and latent variables: Want to maximize: 1.Initialize parameters 2.E Step: Evaluate: 3.M-Step: Re-estimate parameters (based on expectation of complete-data log likelihood) 4.Check for convergence of params or likelihood

Applying EM to Graphical Models Now we have a general form for learning parameters for latent variables. – Take a Guess – Expectation: Evaluate likelihood – Maximization: Reestimate parameters – Check for convergence

Clustering over sequential data Recall HMMs We only looked at training supervised HMMs. What if you believe the data is sequential, but you can’t observe the state.

EM on HMMs also known as Baum-Welch Recall HMM parameters: Now the training counts are estimated.

EM on HMMs Standard EM Algorithm – Initialize – E-Step: evaluate expected likelihood – M-Step: reestimate parameters from expected likelihood – Check for convergence

EM on HMMs Guess: Initialize parameters, E-Step: Compute

EM on HMMs But what are these E{…} quantities? so… These can be efficiently calculated from JTA potentials and separators.

EM on HMMs

Standard EM Algorithm – Initialize – E-Step: evaluate expected likelihood JTA algorithm. – M-Step: reestimate parameters from expected likelihood Using expected values from JTA potentials and separators – Check for convergence

Training latent variables in Graphical Models Now consider a general Graphical Model with latent variables.

EM on Latent Variable Models Guess – Easy, just assign random values to parameters E-Step: Evaluate likelihood. – We can use JTA to evaluate the likelihood. – And marginalize expected parameter values M-Step: Re-estimate parameters. – This can get trickier.

Maximization Step in Latent Variable Models Why is this easy in HMMs, but difficult in general Latent Variable Models? Many parents graphical model

Junction Trees In general, we have no guarantee that we can isolate a single variable. We need to estimate marginal separately. “Dense Graphs”

M-Step in Latent Variable Models M-Step: Reestimate Parameters. – Keep k-1 parameters fixed (to the current estimate) – Identify a better guess for the free parameter.

M-Step in Latent Variable Models M-Step: Reestimate Parameters. – Keep k-1 parameters fixed (to the current estimate) – Identify a better guess for the free parameter.

M-Step in Latent Variable Models M-Step: Reestimate Parameters. – Keep k-1 parameters fixed (to the current estimate) – Identify a better guess for the free parameter.

M-Step in Latent Variable Models M-Step: Reestimate Parameters. – Keep k-1 parameters fixed (to the current estimate) – Identify a better guess for the free parameter.

M-Step in Latent Variable Models M-Step: Reestimate Parameters. – Keep k-1 parameters fixed (to the current estimate) – Identify a better guess for the free parameter.

M-Step in Latent Variable Models M-Step: Reestimate Parameters. – Gibbs Sampling. – This is helpful if it’s easier to sample from a conditional than it is to integrate to get the marginal.

EM on Latent Variable Models Guess – Easy, just assign random values to parameters E-Step: Evaluate likelihood. – We can use JTA to evaluate the likelihood. – And marginalize expected parameter values M-Step: Re-estimate parameters. – Either JTA potentials and marginals OR – Sampling

Today EM as bound maximization EM as a general approach to learning parameters for latent variables Sampling

Next Time Model Adaptation – Using labeled and unlabeled data to improve performance. Model Adaptation Application – Speaker Recognition UBM-MAP