Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Mixture Models and the EM Algorithm
Unsupervised Learning
Log-linear and logistic models Generalised linear model ANOVA revisited Log-linear model: Poisson distribution logistic model: Binomial distribution Deviances.
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
Expectation Maximization
Supervised Learning Recap
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
Midterm Review. The Midterm Everything we have talked about so far Stuff from HW I won’t ask you to do as complicated calculations as the HW Don’t need.
Descriptive statistics Experiment  Data  Sample Statistics Sample mean Sample variance Normalize sample variance by N-1 Standard deviation goes as square-root.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Estimation of parameters. Maximum likelihood What has happened was most likely.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Parametric Inference.
Gaussian Mixture Example: Start After First Iteration.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Part 4 c Baum-Welch Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Psychology 202b Advanced Psychological Statistics, II February 17, 2011.
Maximum Likelihood (ML), Expectation Maximization (EM)
Expectation-Maximization
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
ECE 5984: Introduction to Machine Learning
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
EM Algorithm Likelihood, Mixture Models and Clustering.
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Gaussian Mixture Models and Expectation Maximization.
Review of Lecture Two Linear Regression Normal Equation
Semi-Supervised Learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
EM and expected complete log-likelihood Mixture of Experts
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Lecture 19: More EM Machine Learning April 15, 2010.
Recitation on EM slides taken from:
G AUSSIAN M IXTURE M ODELS David Sears Music Information Retrieval October 8, 2009.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Lecture 17 Gaussian Mixture Models and Expectation Maximization
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
CSE 517 Natural Language Processing Winter 2015
Maximum Likelihood Estimation
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Design and Implementation of Speech Recognition Systems Fall 2014 Ming Li Special topic: the Expectation-Maximization algorithm and GMM Sep Some.
RADFORD M. NEAL GEOFFREY E. HINTON 발표: 황규백
CSE 446: Expectation Maximization (EM) Winter 2012 Daniel Weld Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:
EM Algorithm 主講人:虞台文 大同大學資工所 智慧型多媒體研究室. Contents Introduction Example  Missing Data Example  Mixed Attributes Example  Mixture Main Body Mixture Model.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Lecture 18 Expectation Maximization
CS 2750: Machine Learning Density Estimation
Distributions and Concepts in Probability Theory
Maximum Likelihood Find the parameters of a model that best fit the data… Forms the foundation of Bayesian inference Slide 1.
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Expectation Maximization
Stochastic Optimization Maximization for Latent Variable Models
EM for Inference in MV Data
10701 Recitation Pengtao Xie
Lecture 11 Generalizations of EM.
EM for Inference in MV Data
EM Algorithm 主講人:虞台文.
Clustering (2) & EM algorithm
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Most slides from Expectation Maximization (EM) Northwestern University EECS 395/495 Special Topics in Machine Learning

Most slides from Outline Objective Simple example Complex example

Most slides from Objective Learning with missing/unobservable data J B E A E B A J … Maximum likelihood

Most slides from Objective Learning with missing/unobservable data J B E A E B A J 1 1 ? ? ? 0 … Optimize what?

Most slides from Outline Objective Simple example Complex example

Most slides from Simple example

Most slides from Maximize likelihood

Most slides from Same Problem with Hidden Information Score GradeHidden Observable

Most slides from Same Problem with Hidden Information S G

Most slides from Same Problem with Hidden Information S G

Most slides from EM for our example

Most slides from EM Convergence

Most slides from Generalization X: observable data(score = {h, c, d}) z: missing data(grade = {a, b, c, d}) : model parameters to estimate ( ) E: given, compute the expectation of z M: use z obtained in E step, maximize the likelihood with respect to

Most slides from Outline Objective Simple example Complex example

Most slides from Gaussian Mixtures

Most slides from Gaussian Mixtures Know –Data – - Don’t know –Data label Objective – -

Most slides from The GMM assumption

Most slides from The GMM assumption

Most slides from The GMM assumption

Most slides from The GMM assumption

Most slides from The data generated Coordinates Label

Most slides from Computing the likelihood

Most slides from EM for GMMs

Most slides from EM for GMMs

Most slides from EM for GMMs

Most slides from

Generalization X: observable data z: unobservable data : model parameters to estimate E: given, compute the “expectation” of z M: use z obtained in E step, maximize the likelihood with respect to

Most slides from Exponential family –Yes: normal, exponential, beta, Bernoulli, binomial, multinomial, Poisson… –No: Cauchy and uniform EM using sufficient statistics –S1: computing the expectation of the statistics –S2: set the maximum likelihood For distributions in exponential family

Most slides from What EM really is Maximize expected log likelihood E-step: Determine the expectation M-step: Maximize the expectation above with respect to X: observable data z: missing data

Most slides from Final comments Deal with missing data/latent variables Maximize expected log likelihood Local minima