Download presentation

Presentation is loading. Please wait.

Published byAlexis Belcher Modified about 1 year ago

1
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science

2
Basic Premise Given a set of observed data, X, what is the underlying model that produced X?Given a set of observed data, X, what is the underlying model that produced X? –Example: distributions – Gaussian, Poisson, Uniform Assume we know (or can intuit) what type of model produced dataAssume we know (or can intuit) what type of model produced data Model has m parameters (Θ1..Θm)Model has m parameters (Θ1..Θm) –Parameters are unknown, we would like to estimate them

3
Maximum Likelihood Estimators (MLE) P(Θ|X) = Probability that a set of given parameters are “correct” ??P(Θ|X) = Probability that a set of given parameters are “correct” ?? Instead define “likelihood” of the parameters given the data, L(Θ|X)Instead define “likelihood” of the parameters given the data, L(Θ|X) What if data is continuous?

4
MLE continued We are solving an optimization problem Often solve log() of Likelihood instead. –Why is this the same? Any method that maximizes the likelihood function is called a Maximum Likelihood Estimator

5
Simple Example: Least Squares Fit Input: N points in R^2Input: N points in R^2 Model: A single line, y = ax+bModel: A single line, y = ax+b –Parameters: a, b Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator Input: N points in R^2Input: N points in R^2 Model: A single line, y = ax+bModel: A single line, y = ax+b –Parameters: a, b Origin? Maximum Likelihood EstimatorOrigin? Maximum Likelihood Estimator

6
Expectation Maximization An elaborate technique for maximizing the likelihood functionAn elaborate technique for maximizing the likelihood function Often used when observed data is incompleteOften used when observed data is incomplete –Due to problems in observation process –Due to unknown or difficult distribution function(s) Iterative ProcessIterative Process Still a local techniqueStill a local technique

7
EM likelihood function Observed data X, assume missing data Y.Observed data X, assume missing data Y. Let Z be the complete dataLet Z be the complete data –Joint density function –P(z|Θ) = p(x,y|Θ) = p(y|x,Θ)p(x|Θ) Define new likelihood function L(Θ|Z) = p(X,Y|Θ)Define new likelihood function L(Θ|Z) = p(X,Y|Θ) X,Θ are constants, so L() is a random variable dependent on the random variable Y.X,Θ are constants, so L() is a random variable dependent on the random variable Y.

8
“E” Step of EM Algorithm Since L(Θ|Z) is itself a random variable, we can compute its expected value:Since L(Θ|Z) is itself a random variable, we can compute its expected value: Can be thought of as computing the expected value of Y given the current estimate of Θ.Can be thought of as computing the expected value of Y given the current estimate of Θ.

9
“M” step of EM Algorithm Once we have expectation computed, optimize Θ using the MLE. Convergence – Various results proving convergence cited. Generalized EM – Instead of finding optimal Θ, choose one that increases the MLE

10
Mixture Models Assume “mixture” of probability distributions: Log-likelihood function is difficult to optimize, use a trick: –Assume unobserved data items Y whose values inform us which distribution generated each item in X.

11
Update Equations After much derivation, estimates for new parameters in terms of old result: –Θ = (μ,Σ) Where μ is the mean and Σ is the variance of a d- dimensional normal distribution

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google