Download presentation

Presentation is loading. Please wait.

1
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

2
© Prof. Rolf Ingold 2 Outline Introduction Parameter estimation Non parametric classifiers : kNN Neural networks Hidden Markov Models Other approaches

3
© Prof. Rolf Ingold 3 Introduction Bayesian decision theory provides a theoretical framework for statistical pattern recognition It supposes the following probabilistic information to be available: n, the number of classes P( i ), the a priori probability (prior) of each class i p(x| i ), the distribution of the feature vector x, depending of the class i How to estimate these values and functions ? especially how to estimate the class dependent distribution (or density) functions

4
© Prof. Rolf Ingold 4 Approaches for statistical pattern recognition Several approaches try to overcome the difficulty of getting the class dependent feature distributions (or densities): Parameter estimation : the form of the distributions is supposed to be known; only some parameters have to be estimated from training samples Parzen windows : densities are estimated from training samples by “smoothing” them with a window function K-nearest neighbors (KNN) rule : the decision is associated with the dominant class of the K-nearest neighbors taken from the training samples Functional discrimination : the decision consist in minimizing an objective function within an augmented feature space

5
© Prof. Rolf Ingold 5 Parameter Estimation By hypothesis, the following information is supposed to be known n, the number of classes for each class i the a priori probability P( i ) the functional form of the class conditional feature densities with unknown parameters i a labeled set of training data D i ={x i1, x i2,..., x iN i } supposed to be drawn randomly from i In fact parameter estimation can be performed class by class

6
© Prof. Rolf Ingold 6 Maximum likelihood criteria Maximum likelihood estimation consists in determining i that maximizes the likelihood of D i, i.e For some distributions, the problem can be solved analytically by the equations is it really a maximum ? If the solution can not be found analytically, it can be computed iteratively by a gradient climbing method

7
© Prof. Rolf Ingold 7 Univariate Gaussian distribution In one dimension, the normal distribution N( , ) is defined by the expression represents the mean represents la variance le maximum of the curve corresponds to

8
© Prof. Rolf Ingold 8 Multivariate Gaussian distribution At d dimensions, the generalized normal distribution N( , ) is defined by where represents the mean vector represents the covariance matrix

9
© Prof. Rolf Ingold 9 Interpretation of the parameters The mean vector represents the center of the distribution The covariance matrix describes the scatter it is symmetrical : ij ji it is positive semidefinite (usually postive definite) ii i 2 ≥ the principal axes of the hyperboloids are given by the eigenvectors of the length of the axes are given by the eigenvalues if two features x i and x j are statistically independent, then ij ji

10
© Prof. Rolf Ingold 10 Mahalanobis distance Regions of constant density are hyperboloids centered at and characterized by the equations where C is a positive constant The Mahalanobis distance from x to is defined as

11
© Prof. Rolf Ingold 11 Estimation of and of normal distributions In the one-dimensional case, the maximum likelihood criteria leads to following equations In the one-dimensional case the solution is Generalized to the multi-dimensional case, we obtain

12
© Prof. Rolf Ingold 12 Bias Problem The estimation for (resp. ) is biased; the expected value over all sets of size n is different to the true variance, which is An unbiased estimation would be Both estimator converge asymptotically Which estimator is correct ? they are neither right or wrong ! no one has all desirable properties Bayesian learning theory can give an answer

13
© Prof. Rolf Ingold 13 Discriminant functions for normal distributions (1) For normal distributions, the following discriminant functions may be stated In the case where all classes share the same covariance matrix the decision boundaries are linear

14
© Prof. Rolf Ingold 14 Linear decision boundaries for normal distributions

15
© Prof. Rolf Ingold 15 Discriminant functions for normal distributions (2) In the case of arbitrary covariance matrices, boundaries become quadratic

16
© Prof. Rolf Ingold 16

17
© Prof. Rolf Ingold 17 Font Recognition : 1D-Gaussian estimation (1) Font style discrimination (■ roman ■ italic) using hpd-stdev estimated models fit with distributions decision boundary is accurate recognition accuracy (96.3%) is confirmed by the experimental confusion matrix

18
© Prof. Rolf Ingold 18 Font Recognition : 1D-Gaussian estimation (2) Font boldness discrimination (■ normal ■ bold) using hr-mean estimated models do not fit real distributions decision boundary is surprisingly well adapted recognition accuracy (97.6%) is high as observed from the experimental confusion matrix

19
© Prof. Rolf Ingold 19 Font Recognition : 1D-Gaussian estimation (3) Boldness is generally dependent on the font family hr-mean can perfectly discriminate ■ normal and ■ bold fonts if the font family is known (recognition rate > 99.9%) Times Courier Arial all

20
© Prof. Rolf Ingold 20 Font Recognition : 1D-Gaussian estimation (4) Font family discrimination (■ Arial, ■ Courier, ■ Times) using hr-mean estimated models do not fit real distributions at all decision boundary are inadequate recognition accuracy is bad (41,9%)

21
© Prof. Rolf Ingold 21 Font Recognition : 1D-Multi-Gaussian estimation Font family discrimination (■ Arial, ■ Courier, ■ Times) using hr-mean, supposing font style to be known for learning estimated models fit real distributions decision boundary are adequate recognition accuracy is nearly optimal for the given feature (89,6%)

22
© Prof. Rolf Ingold 22 Font Recognition : 2D-Gaussian estimation Font family discrimination (■ Arial, ■ Courier, ■ Times) using two features: hr-stdev and vr-mean models fit approximately two classes but not the third one decision boundary is surprisingly well adapted recognition accuracy (93,5%) is reasonable

23
© Prof. Rolf Ingold 23 Font Recognition : General Gaussian estimation Performance of font family discrimination (■ Arial, ■ Courier, ■ Times) depends of the used feature set hr-stdev : recognition rate => 72,7% hr-stdev, vr-mean : recognition rate => 93,5% hp-mean, hr-mean, vr-mean : recognition rate => 98,0% hp-mean, hpd-stdev, hr-mean, vr-mean, hr-stdev, vr-stdev : recognition rate => 99,7%

24
© Prof. Rolf Ingold 24 Font recognition : classifier for all 12 classes Discrimination of all fonts using all features hp-mean, hpd-stdev, hr-mean, hr-stdev, vr-mean, vr-stdev overall recognition rate of 99.6% most errors due to roman/italic confusion

25
© Prof. Rolf Ingold 25 Error types In a Bayesian classifier using parameter estimation, several error types occur Indistinguishability errors, due to overlapping of distributions, which are inherent to the problem can not be reduced Modeling errors, due to a bad choice for the parametric density functions (models) can be avoided by changing the models Modeling errors, due to the imprecision of training data; can be improved by increasing training data

26
© Prof. Rolf Ingold 26 Influence of the size of training data Evolution of the error rate as function of the size of training sets (experiment with 4 training sets and 2 test sets, ■ average)

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google