Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS479/679 Pattern Recognition Dr. George Bebis

Similar presentations


Presentation on theme: "CS479/679 Pattern Recognition Dr. George Bebis"— Presentation transcript:

1 CS479/679 Pattern Recognition Dr. George Bebis
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis

2 Parameter Estimation Bayesian Decision Theory allows us to design an optimal classifier given that we have estimated P(i) and p(x/i) first: Estimating P(i) is usually not very difficult. Estimating p(x/i) could be more difficult: Dimensionality of feature space is large. Number of samples is often too small.

3 Parameter Estimation (cont’d)
We will make the following assumptions: A set of training samples D ={x1, x2, ...., xn} is given, where the samples were drawn according to p(x|wj). p(x|wj) has some known parametric form: Parameter estimation problem: e.g., p(x /i) ~ N(μ i , i) also denoted as p(x/q) where q=(μi , Σi) Given D, find the best possible q

4 Main Methods in Parameter Estimation
Maximum Likelihood (ML) Bayesian Estimation (BE)

5 Main Methods in Parameter Estimation
Maximum Likelihood (ML) Best estimate is obtained by maximizing the probability of obtaining the samples D ={x1, x2, ...., xn} actually observed: ML assumes that θ is fixed and makes a point estimate:

6 Main Methods in Parameter Estimation (cont’d)
Bayesian Estimation (BE) Assumes that θ is a set of random variables that have some known a-priori distribution p(θ). Estimates a distribution rather than making a point estimate (i.e., like ML): Note: the BE solution p(x/D) might not be of the parametric form assumed (e.g., p(x/q)).

7 ML Estimation - Assumptions
Consider c classes and c training data sets (i.e., one for each class): Samples in Dj are drawn independently according to p(x/ωj). Problem: given D1, D2, ...,Dc and a model for p(x/ωj) ~ p(x/q), estimate: D1, D2, ...,Dc q1, q2,…, qc

8 ML Estimation - Problem Formulation
If the samples in Dj provide no information about qi ( ), we need to solve c independent problems (i.e., one for each class). The ML estimate for D={x1,x2,..,xn} is the value that maximizes p(D / q) (i.e., best supports the training data). Using independence assumption, we can simplify p(D / q) :

9 ML Estimation - Solution
How can we find the maximum of p(D/ q) ? where (gradient)

10 ML Estimation Using Log-Likelihood
Taking the log for simplicity: Maximizes ln p(D/ θ): log-likelihood

11 Example p(D / θ) =μ ln p(D/ θ) training data: unknown mean,
known variance p(D / θ) ln p(D/ θ)

12 ML for Multivariate Gaussian Density: Case of Unknown θ=μ
Assume Computing the gradient, we have:

13 ML for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d)
Setting we have: The solution is given by The ML estimate is simply the “sample mean”.

14 Special Case: Maximum A-Posteriori Estimator (MAP)
Assume that θ is a random variable with known p(θ). Maximize p(θ/D) or p(D/θ)p(θ) or ln p(D/ θ)p(θ): Consider:

15 Special Case: Maximum A-Posteriori Estimator (MAP) (cont’d)
What happens when p(θ) is uniform? MAP is equivalent to ML

16 MAP for Multivariate Gaussian Density: Case of Unknown θ=μ
Assume Maximize ln p(μ /D) = ln p(D/ μ)p(μ): and (both are known)

17 MAP for Multivariate Gaussian Density: Case of Unknown θ=μ (cont’d)
If , then What happens when

18 ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ2)
Assume θ =(θ1,θ2)=(μ,σ2) p(xk/θ) p(xk/θ) p(xk/θ)

19 ML for Univariate Gaussian Density: Case of Unknown θ=(μ,σ2) (cont’d)
p(xk/ θ)=0 =0 =0 The solutions are given by: sample mean sample variance

20 ML for Multivariate Gaussian Density: Case of Unknown θ=(μ,Σ)
In the general case (i.e., multivariate Gaussian) the solutions are: sample mean sample covariance

21 Biased and Unbiased Estimates
An estimate is unbiased when The ML estimate is unbiased, i.e., The ML estimates and are biased:

22 Biased and Unbiased Estimates (cont’d)
The following are unbiased estimates of and

23 Comments ML estimation is simpler than alternative methods.
ML provides more accurate estimates as the number of training samples increases. If the model for p(x/ θ) is correct, and the independence assumptions among samples are true, ML will work well.


Download ppt "CS479/679 Pattern Recognition Dr. George Bebis"

Similar presentations


Ads by Google