# Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

## Presentation on theme: "Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."— Presentation transcript:

Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

© Prof. Rolf Ingold 2 Outline  Introduction  Density estimation from training samples  Two different approaches  Parzen Windows  k-Nearest-neighbor approach  k-Nearest-neighbor rule  Nearest Neighbor rule  Error bounds  Distances

© Prof. Rolf Ingold 3 Introduction  It is often not obvious to characterize the densities by parametric functions  typically when distributions have multiple and unregular peaks  The principle consists in estimating density functions directly from training sets

© Prof. Rolf Ingold 4 Density Estimation (1)  For a given class, suppose P being the probability for a randomly selected sample to belong to a regions R, i.e  The probability than k samples out of n belong to the same region is given by the binomial low  from which we get the expectation for k : E[k] = nP  If p(x) is continuous and R is a very small region around x, we get  where V is the volume of region R  which is leading to the following estimator :

© Prof. Rolf Ingold 5 Density Estimation (2)  When using respectively 1,2,...n samples, let us consider a sequence of regions around x denoted R 1, R 2,..., R n  let V n be the volume of R n  let k n be the number of samples falling in R n  Then it can be shown that the sequence p 1 (x), p 2 (x),..., p n (x) is converging to p(x) if the following conditions are all satisfied

© Prof. Rolf Ingold 6 Two different approaches  Two approaches satisfy these conditions  Parzen windows, defining the regions by their volumes  k-nearest-neighbor rule (kNN), defining the regions by the number of samples falling in them

© Prof. Rolf Ingold 7 Principle of Parzen Windows  Each sample of the training set contributes to the estimated density by contributing to it with a window function  the width of the window must be chosen carefully  if the window width is too large, the decision boundaries have too less resolution  if the window width is too small, there is a risk of overfitting

© Prof. Rolf Ingold 8 Decision boundaries for different Parzen window widths  In fact the window width should be adapted locally

© Prof. Rolf Ingold 9 k-nearest-neighbor approach  The k-nearest-neighbor approach avoids the problem of Parzen windows:  the "window width" is automatically adapted to the local density, i.e. to the k closest samples

© Prof. Rolf Ingold 10 Density functions for k-nearest-neighbors  The density functions are continuous, bat not their derivative ! Illustration of density functions for k = 3 and k = 5

© Prof. Rolf Ingold 11 Estimation of a posteriori probabilities  Lets consider a region centered at x having a volume V and containing exactly k samples from the training set,  k i of them are supposed to belong to class  i  The joint probability of x and  i is  The estimated a posteriori probabilities are  This justifies the rule of choosing the class  i corresponding to the highest value for k i

© Prof. Rolf Ingold 12 Choice of k for the k-nearest neighbor rule  The parameter k is chosen as a function of n  by choosing  we get  showing that V 0 is depending on p n (x)

© Prof. Rolf Ingold 13 Nearest Neighbor rule  The nearest neighbor rule is a suboptimal rule that is classifying a sample x to the class of the nearest neighbor  It can be shown that the probability of error P of the nearest neighbor rule is bounded by where P* represents the Bayes error

© Prof. Rolf Ingold 14 Generalization to the kNN rule  The error rate of the KNN rule is plotted in the graphic below for the two category case  it shows that asymptotically (when k→∞ ) the error rate converge to the Bayes error

© Prof. Rolf Ingold 15 Distances  The k-nearest neighbor relies on a distance (or metric)  Algebraically, a distance must satisfy four properties  non-negativity : d(a,b) ≥ 0  reflexivity : d(a,b) = 0 if and only if a=b  symmetry : d(a,b) = d(b,a)  triangle inequality : d(a,b) + d(b,c) ≥ d(a,c)

© Prof. Rolf Ingold 16 Problem with distances  Scaling the coordinates of a feature space can change the relationship induced by the distance  To avoid arbitrary scaling, it is recommended to perform feature normalization, i.e. determining the scale accordingly to  min-max interval of each feature  standard deviation of individual feature distribution

© Prof. Rolf Ingold 17 Generalized distances  The Minkovski distance generalizing the Euclidian distance is defined by it leads to the following special cases  the Euclidian distance (for k=2 )  the Manhattan distance or city block distance (for k=1 )  the maximum distance (for k=∞ )  Many other distances do exist

Download ppt "Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008."

Similar presentations