Download presentation

Presentation is loading. Please wait.

1
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

2
© Prof. Rolf Ingold 2 Outline Introduction Density estimation from training samples Two different approaches Parzen Windows k-Nearest-neighbor approach k-Nearest-neighbor rule Nearest Neighbor rule Error bounds Distances

3
© Prof. Rolf Ingold 3 Introduction It is often not obvious to characterize the densities by parametric functions typically when distributions have multiple and unregular peaks The principle consists in estimating density functions directly from training sets

4
© Prof. Rolf Ingold 4 Density Estimation (1) For a given class, suppose P being the probability for a randomly selected sample to belong to a regions R, i.e The probability than k samples out of n belong to the same region is given by the binomial low from which we get the expectation for k : E[k] = nP If p(x) is continuous and R is a very small region around x, we get where V is the volume of region R which is leading to the following estimator :

5
© Prof. Rolf Ingold 5 Density Estimation (2) When using respectively 1,2,...n samples, let us consider a sequence of regions around x denoted R 1, R 2,..., R n let V n be the volume of R n let k n be the number of samples falling in R n Then it can be shown that the sequence p 1 (x), p 2 (x),..., p n (x) is converging to p(x) if the following conditions are all satisfied

6
© Prof. Rolf Ingold 6 Two different approaches Two approaches satisfy these conditions Parzen windows, defining the regions by their volumes k-nearest-neighbor rule (kNN), defining the regions by the number of samples falling in them

7
© Prof. Rolf Ingold 7 Principle of Parzen Windows Each sample of the training set contributes to the estimated density by contributing to it with a window function the width of the window must be chosen carefully if the window width is too large, the decision boundaries have too less resolution if the window width is too small, there is a risk of overfitting

8
© Prof. Rolf Ingold 8 Decision boundaries for different Parzen window widths In fact the window width should be adapted locally

9
© Prof. Rolf Ingold 9 k-nearest-neighbor approach The k-nearest-neighbor approach avoids the problem of Parzen windows: the "window width" is automatically adapted to the local density, i.e. to the k closest samples

10
© Prof. Rolf Ingold 10 Density functions for k-nearest-neighbors The density functions are continuous, bat not their derivative ! Illustration of density functions for k = 3 and k = 5

11
© Prof. Rolf Ingold 11 Estimation of a posteriori probabilities Lets consider a region centered at x having a volume V and containing exactly k samples from the training set, k i of them are supposed to belong to class i The joint probability of x and i is The estimated a posteriori probabilities are This justifies the rule of choosing the class i corresponding to the highest value for k i

12
© Prof. Rolf Ingold 12 Choice of k for the k-nearest neighbor rule The parameter k is chosen as a function of n by choosing we get showing that V 0 is depending on p n (x)

13
© Prof. Rolf Ingold 13 Nearest Neighbor rule The nearest neighbor rule is a suboptimal rule that is classifying a sample x to the class of the nearest neighbor It can be shown that the probability of error P of the nearest neighbor rule is bounded by where P* represents the Bayes error

14
© Prof. Rolf Ingold 14 Generalization to the kNN rule The error rate of the KNN rule is plotted in the graphic below for the two category case it shows that asymptotically (when k→∞ ) the error rate converge to the Bayes error

15
© Prof. Rolf Ingold 15 Distances The k-nearest neighbor relies on a distance (or metric) Algebraically, a distance must satisfy four properties non-negativity : d(a,b) ≥ 0 reflexivity : d(a,b) = 0 if and only if a=b symmetry : d(a,b) = d(b,a) triangle inequality : d(a,b) + d(b,c) ≥ d(a,c)

16
© Prof. Rolf Ingold 16 Problem with distances Scaling the coordinates of a feature space can change the relationship induced by the distance To avoid arbitrary scaling, it is recommended to perform feature normalization, i.e. determining the scale accordingly to min-max interval of each feature standard deviation of individual feature distribution

17
© Prof. Rolf Ingold 17 Generalized distances The Minkovski distance generalizing the Euclidian distance is defined by it leads to the following special cases the Euclidian distance (for k=2 ) the Manhattan distance or city block distance (for k=1 ) the maximum distance (for k=∞ ) Many other distances do exist

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google