Download presentation
Presentation is loading. Please wait.
Published byJohn James Modified over 9 years ago
1
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known l Univariate Case: unknown and unknown 2 l Bias l Appendix: Maximum-Likelihood Problem Statement
2
Pattern Classification, Chapter 3 1 l Introduction l Data availability in a Bayesian framework l We could design an optimal classifier if we knew: l P( i ) (priors) l P(x | i ) (class-conditional densities) Unfortunately, we rarely have this complete information! l Design a classifier from a training sample l No problem with prior estimation l Samples are often too small for class-conditional estimation (large dimension of feature space!) 1
3
Pattern Classification, Chapter 3 2 l A priori information about the problem l Normality of P(x | i ) P(x | i ) ~ N( i, i ) l Characterized by i and i parameters l Estimation techniques l Maximum-Likelihood and Bayesian estimations l Results nearly identical, but approaches are different l We will not cover Bayesian estimation details 1
4
Pattern Classification, Chapter 3 3 l Parameters in Maximum-Likelihood estimation are fixed but unknown! l Best parameters are obtained by maximizing the probability of obtaining the samples observed l Bayesian methods view the parameters as random variables having some known distribution l In either approach, we use P( i | x) for our classification rule! 1
5
Pattern Classification, Chapter 3 4 l Maximum-Likelihood Estimation l Has good convergence properties as the sample size increases l Simpler than alternative techniques l General principle l Assume we have c classes and P(x | j ) ~ N( j, j ) P(x | j ) P (x | j, j ) where: 2
6
Pattern Classification, Chapter 3 5 l Use the information provided by the training samples to estimate = ( 1, 2, …, c ), each i (i = 1, 2, …, c) is associated with each category l Suppose that D contains n samples, x 1, x 2,…, x n l ML estimate of is, by definition the value that maximizes P(D | ) “It is the value of that best agrees with the actually observed training sample” 2
7
Pattern Classification, Chapter 3 6 2 Likelihood Log-likelihood ( fixed, = unknown
8
Pattern Classification, Chapter 3 7 l Optimal estimation l Let = ( 1, 2, …, p ) t and let be the gradient operator l We define l( ) as the log-likelihood function l( ) = ln P(D | ) l New problem statement: determine that maximizes the log-likelihood 2
9
Pattern Classification, Chapter 3 8 Set of necessary conditions for an optimum is: l = 0 2 n = number of training samples
10
Pattern Classification, Chapter 3 9 l Multivariate Gaussian: unknown , known l Samples drawn from multivariate Gaussian population P(x i | ) ~ N( , ) = = therefore: The ML estimate for must satisfy: 2
11
Pattern Classification, Chapter 3 10 Multiplying by and rearranging, we obtain: Just the arithmetic average of the samples of the training samples! Conclusion: If P(x k | j ) (j = 1, 2, …, c) is supposed to be Gaussian in a d- dimensional feature space; then we can estimate the vector = ( 1, 2, …, c ) t and perform an optimal classification! 2
12
Pattern Classification, Chapter 3 11 l Univariate Gaussian: unknown , unknown 2 l Samples drawn from univariate Gaussian population P(x i | , 2 ) ~ N( , 2 ) = ( 1, 2 ) = ( , 2 ) 2
13
Pattern Classification, Chapter 3 12 Summation: Combining (1) and (2), one obtains: 2
14
Pattern Classification, Chapter 3 13 l Bias l Maximum-Likelihood estimate for 2 is biased l An elementary unbiased estimator for is: 2
15
Pattern Classification, Chapter 3 14 l Appendix: Maximum-Likelihood Problem Statement l Let D = {x 1, x 2, …, x n } P(x 1,…, x n | ) = 1,n P(x k | ); |D| = n Our goal is to determine (value of that makes this sample the most representative!) 2
16
Pattern Classification, Chapter 3 15 |D| = n x1x1 x2x2 xnxn.................. x 11 x 20 x 10 x8x8 x9x9 x1x1 N( j, j ) = P(x j, 1 ) D1D1 DcDc DkDk P(x j | 1 ) P(x j | k ) 2
17
Pattern Classification, Chapter 3 16 = ( 1, 2, …, c ) Problem: find such that: 2
18
Pattern Classification, Chapter 3 17 l Sources of final-system classification error (sec 3.5.1) l Bayes Error l Error due to overlapping densities for different classes (inherent error, never eliminated) l Model Error l Error due to having an incorrect model l Estimation Error l Error from estimating parameters from finite sample 1
19
Pattern Classification, Chapter 1 18 Problems of Dimensionality (sec 3.7) Accuracy, Dimension, Training Sample Size l Classification accuracy depends upon the dimensionality and the amount of training data l Case of two classes multivariate normal with the same covariance 7
20
Pattern Classification, Chapter 1 19 l If features are independent then: l Most useful features are the ones for which the difference between the means is large relative to the standard deviation l It appears that adding new features improves accuracy l It has frequently been observed in practice that, beyond a certain point, the inclusion of additional features leads to worse rather than better performance: we have the wrong model ! 7
21
Pattern Classification, Chapter 1 20 77 7
22
Pattern Classification, Chapter 1 21 Computational Complexity l Maximum-Likelihood Estimation l Gaussian priors in d dimensions, with n training samples for each of c classes l For each category, we have to compute the discriminant function Total = O(d 2.. n) Total for c classes = O(cd 2.n) O(d 2.n) l Cost increase when d and n are large! 7
23
Pattern Classification, Chapter 1 22 Overfitting l Number of training samples n can be inadequate for estimating the parameters l What to do? l Simplify the model – reduce the parameters l Assume all classes have same covariance matrix l Assume statistical independence l Reduce number of features d l Principal Component Analysis, etc. 8
24
Pattern Classification, Chapter 1 23 8
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.