 # Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis.

## Presentation on theme: "Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis."— Presentation transcript:

Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis

Parameter Estimation Bayesian Decision Theory allows us to design an optimal classifier given that we know P(  i ) and p(x/  i ): Estimating P(  i ) is usually not difficult. Estimating p(x/  i ) is more difficult: – Number of samples is often too small – Dimensionality of feature space is large.

Assumptions – A set of training samples D ={x 1, x 2,...., x n }, where the samples were drawn according to p(x|  j ). – p(x|  j ) has some known parametric form: Parameter estimation problem: Parameter Estimation (cont’d) Given D, find the best possible  also denoted as p(x /  ) where  =(μ i, Σ i ) e.g., p(x /  i ) ~ N(μ i,  i )

Main Methods in Parameter Estimation Maximum Likelihood (ML) Bayesian Estimation (BE)

Main Methods in Parameter Estimation Maximum Likelihood (ML) – Assumes that the values of the parameters are fixed but unknown. – Best estimate is obtained by maximizing the probability of obtaining the samples x 1,x 2,..,x n actually observed (i.e., training data):

Main Methods in Parameter Estimation (cont’d) Bayesian Estimation (BE) – Assumes that the parameters θ  are random variables that have some known a-priori distribution p(θ . – Estimates a distribution rather than making point estimates like ML: Note: the BE solution might not be of the parametric form assumed!

ML Estimation - Assumptions Let us assume c classes and that the training data consists of c sets (i.e., one for each class): Samples in D j have been drawn independently according to p(x/ω j ). p(x/ω j ) has known parametric form with parameters  j : e.g.,  j =(μ j, Σ j ) for Gaussian distribution D 1, D 2,...,D c

ML Estimation - Problem Formulation and Solution Problem: given D 1, D 2,...,D c and a model for each class, estimate If the samples in D j give no information about  i ( ), we need to solve c independent problems (i.e., one for each class) The ML estimate for D={x 1,x 2,..,x n } is the value that maximizes p(D /  ) (i.e., best supports the training data).  1,  2,…,  c (using independence assumption)

ML Estimation - Problem Definition and Solution (cont’d) How should we find the maximum of p(D/  ) ? where

ML Estimation Using Log-Likelihood Consider the log-likelihood for simplicity: The solution maximizes ln p(D/ θ)

ML Estimation Using Log-Likelihood (cont’d) ln p(D/ θ) p(D / θ) =μ=μ=μ=μ training data, unknown mean, known variance

ML for Multivariate Gaussian Density: Case of Unknown θ=μ Consider Computing the gradient, we have

Setting we have: The solution is given by The ML estimate is simply the “sample mean”. ML for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d)

Special Case of ML: Maximum A-Posteriori Estimator (MAP) Assume that θ is a random variable with known p(θ). Maximize p(θ/D) or p(D/θ)p(θ) or ln p(D/ θ)p(θ): Consider:

Special Case of ML: Maximum A-Posteriori Estimator (MAP) What happens when p(θ) is uniform? MAP is equivalent to ML

MAP for Multivariate Gaussian Density: Case of Unknown θ=μ Assume MAP maximizes ln p(D/ μ)p(μ): maximize where (known)

MAP for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d) If, then What happens when

ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ 2 ) Assume p(x k /θ) θ =(θ 1,θ 2 )=(μ,σ 2 )

ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ 2 ) (cont’d) =0 p(x k /θ)=0 The solutions are given by: =0 sample mean sample variance

ML for Multivariate Gaussian Density: Case of Unknown θ=(μ,Σ) In the general case (i.e., multivariate Gaussian) the solutions are: sample mean sample covariance

Biased and Unbiased Estimates An estimate is unbiased when where θ is the true value. The ML estimate is unbiased, i.e., The ML estimate and is biased:

Biased and Unbiased Estimates (cont’d) The following are unbiased estimates of and

Comments about ML ML estimation is usually simpler than alternative methods. It provides more accurate estimates as the number of training samples increases. If the model chosen for p(x/ θ) is correct, and independence assumptions among samples are true, ML will give very good results.

Download ppt "Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections 3.1-3.2 CS479/679 Pattern Recognition Dr. George Bebis."

Similar presentations