Presentation is loading. Please wait.

Presentation is loading. Please wait.

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Similar presentations


Presentation on theme: "Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."— Presentation transcript:

1 Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

2 © Prof. Rolf Ingold 2 Outline  Introduction  Parameter estimation  Non parametric classifiers : kNN  Neural networks  Hidden Markov Models  Other approaches

3 © Prof. Rolf Ingold 3 Introduction  Bayesian decision theory provides a theoretical framework for statistical pattern recognition  It supposes the following probabilistic information to be available:  n, the number of classes  P(  i ), the a priori probability (prior) of each class  i  p(x|  i ), the distribution of the feature vector x, depending of the class  i  How to estimate these values and functions ?  especially how to estimate the class dependent distribution (or density) functions

4 © Prof. Rolf Ingold 4 Approaches for statistical pattern recognition  Several approaches try to overcome the difficulty of getting the class dependent feature distributions (or densities):  Parameter estimation : the form of the distributions is supposed to be known; only some parameters have to be estimated from training samples  Parzen windows : densities are estimated from training samples by “smoothing” them with a window function  K-nearest neighbors (KNN) rule : the decision is associated with the dominant class of the K-nearest neighbors taken from the training samples  Functional discrimination : the decision consist in minimizing an objective function within an augmented feature space

5 © Prof. Rolf Ingold 5 Parameter Estimation  By hypothesis, the following information is supposed to be known  n, the number of classes  for each class  i  the a priori probability P(  i )  the functional form of the class conditional feature densities with unknown parameters  i  a labeled set of training data D i ={x i1, x i2,..., x iN i } supposed to be drawn randomly from  i  In fact parameter estimation can be performed class by class

6 © Prof. Rolf Ingold 6 Maximum likelihood criteria  Maximum likelihood estimation consists in determining  i that maximizes the likelihood of D i, i.e  For some distributions, the problem can be solved analytically by the equations  is it really a maximum ?  If the solution can not be found analytically, it can be computed iteratively by a gradient climbing method

7 © Prof. Rolf Ingold 7 Univariate Gaussian distribution  In one dimension, the normal distribution N( ,   ) is defined by the expression   represents the mean     represents la variance  le  maximum of the curve corresponds to

8 © Prof. Rolf Ingold 8 Multivariate Gaussian distribution  At d dimensions, the generalized normal distribution N( ,  ) is defined by where   represents the mean vector   represents the covariance matrix

9 © Prof. Rolf Ingold 9 Interpretation of the parameters  The mean vector  represents the center of the distribution  The covariance matrix  describes the scatter  it is symmetrical :  ij  ji  it is positive semidefinite (usually postive definite)  ii  i 2 ≥   the principal axes of the hyperboloids are given by the eigenvectors of   the length of the axes are given by the eigenvalues  if two features x i and x j are statistically independent, then  ij  ji 

10 © Prof. Rolf Ingold 10 Mahalanobis distance  Regions of constant density are hyperboloids centered at  and characterized by the equations where C is a positive constant  The Mahalanobis distance from x to  is defined as

11 © Prof. Rolf Ingold 11 Estimation of  and  of normal distributions  In the one-dimensional case, the maximum likelihood criteria leads to following equations  In the one-dimensional case the solution is  Generalized to the multi-dimensional case, we obtain

12 © Prof. Rolf Ingold 12 Bias Problem  The estimation for   (resp.   ) is biased; the expected value over all sets of size n is different to the true variance, which is  An unbiased estimation would be  Both estimator converge asymptotically  Which estimator is correct ?  they are neither right or wrong !  no one has all desirable properties  Bayesian learning theory can give an answer

13 © Prof. Rolf Ingold 13 Discriminant functions for normal distributions (1)  For normal distributions, the following discriminant functions may be stated  In the case where all classes share the same covariance matrix  the decision boundaries are linear

14 © Prof. Rolf Ingold 14 Linear decision boundaries for normal distributions

15 © Prof. Rolf Ingold 15 Discriminant functions for normal distributions (2)  In the case of arbitrary covariance matrices, boundaries become quadratic

16 © Prof. Rolf Ingold 16

17 © Prof. Rolf Ingold 17 Font Recognition : 1D-Gaussian estimation (1)  Font style discrimination (■ roman ■ italic) using hpd-stdev  estimated models fit with distributions  decision boundary is accurate  recognition accuracy (96.3%) is confirmed by the experimental confusion matrix

18 © Prof. Rolf Ingold 18 Font Recognition : 1D-Gaussian estimation (2)  Font boldness discrimination (■ normal ■ bold) using hr-mean  estimated models do not fit real distributions  decision boundary is surprisingly well adapted  recognition accuracy (97.6%) is high as observed from the experimental confusion matrix

19 © Prof. Rolf Ingold 19 Font Recognition : 1D-Gaussian estimation (3)  Boldness is generally dependent on the font family  hr-mean can perfectly discriminate ■ normal and ■ bold fonts if the font family is known (recognition rate > 99.9%) Times Courier Arial all

20 © Prof. Rolf Ingold 20 Font Recognition : 1D-Gaussian estimation (4)  Font family discrimination (■ Arial, ■ Courier, ■ Times) using hr-mean  estimated models do not fit real distributions at all  decision boundary are inadequate  recognition accuracy is bad (41,9%)

21 © Prof. Rolf Ingold 21 Font Recognition : 1D-Multi-Gaussian estimation  Font family discrimination (■ Arial, ■ Courier, ■ Times) using hr-mean, supposing font style to be known for learning  estimated models fit real distributions  decision boundary are adequate  recognition accuracy is nearly optimal for the given feature (89,6%)

22 © Prof. Rolf Ingold 22 Font Recognition : 2D-Gaussian estimation  Font family discrimination (■ Arial, ■ Courier, ■ Times) using two features: hr-stdev and vr-mean  models fit approximately two classes but not the third one  decision boundary is surprisingly well adapted  recognition accuracy (93,5%) is reasonable

23 © Prof. Rolf Ingold 23 Font Recognition : General Gaussian estimation  Performance of font family discrimination (■ Arial, ■ Courier, ■ Times) depends of the used feature set  hr-stdev : recognition rate => 72,7%  hr-stdev, vr-mean : recognition rate => 93,5%  hp-mean, hr-mean, vr-mean : recognition rate => 98,0%  hp-mean, hpd-stdev, hr-mean, vr-mean, hr-stdev, vr-stdev : recognition rate => 99,7%

24 © Prof. Rolf Ingold 24 Font recognition : classifier for all 12 classes  Discrimination of all fonts using all features hp-mean, hpd-stdev, hr-mean, hr-stdev, vr-mean, vr-stdev  overall recognition rate of 99.6%  most errors due to roman/italic confusion

25 © Prof. Rolf Ingold 25 Error types  In a Bayesian classifier using parameter estimation, several error types occur  Indistinguishability errors, due to overlapping of distributions, which are inherent to the problem  can not be reduced  Modeling errors, due to a bad choice for the parametric density functions (models)  can be avoided by changing the models  Modeling errors, due to the imprecision of training data;  can be improved by increasing training data

26 © Prof. Rolf Ingold 26 Influence of the size of training data Evolution of the error rate as function of the size of training sets (experiment with 4 training sets and 2 test sets, ■ average)


Download ppt "Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."

Similar presentations


Ads by Google