Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Slides:

Advertisements

Similar presentations

Principles of Density Estimation

Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)

POINT ESTIMATION AND INTERVAL ESTIMATION

Lecture 3 Nonparametric density estimation and classification

Pattern recognition Professor Aly A. Farag

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Chapter 4 (Part 1): Non-Parametric Classification

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Prénom Nom Document Analysis: Segmentation & Layout Analysis Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

1-NN Rule: Given an unknown sample X decide if for That is, assign X to category if the closest neighbor of X is from category i.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Evaluating Hypotheses

Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 4 (part 2): Non-Parametric Classification

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.

Prénom Nom Document Analysis: Fundamentals of pattern recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

Principles of Pattern Recognition

Non-Parametric Learning Prof. A.L. Yuille Stat 231. Fall Chp 4.1 – 4.3.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.

Non-Parameter Estimation 主講人：虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.

EECS 274 Computer Vision Segmentation by Clustering II.

1 E. Fatemizadeh Statistical Pattern Recognition.

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Consistency An estimator is a consistent estimator of θ, if , i.e., if

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.

CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Lecture 15. Pattern Classification (I): Statistical Formulation

Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17

Non-Parameter Estimation

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

3(+1) classifiers from the Bayesian world

CONCEPTS OF HYPOTHESIS TESTING

Outline Parameter estimation – continued Non-parametric methods.

K Nearest Neighbor Classification

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

LECTURE 23: INFORMATION THEORY REVIEW

LECTURE 16: NONPARAMETRIC TECHNIQUES

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Nonparametric density estimation and classification

Mathematical Foundations of BME

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.

Presentation transcript:

Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008

© Prof. Rolf Ingold 2 Outline  Introduction  Density estimation from training samples  Two different approaches  Parzen Windows  k-Nearest-neighbor approach  k-Nearest-neighbor rule  Nearest Neighbor rule  Error bounds  Distances

© Prof. Rolf Ingold 3 Introduction  It is often not obvious to characterize the densities by parametric functions  typically when distributions have multiple and unregular peaks  The principle consists in estimating density functions directly from training sets

© Prof. Rolf Ingold 4 Density Estimation (1)  For a given class, suppose P being the probability for a randomly selected sample to belong to a regions R, i.e  The probability than k samples out of n belong to the same region is given by the binomial low  from which we get the expectation for k : E[k] = nP  If p(x) is continuous and R is a very small region around x, we get  where V is the volume of region R  which is leading to the following estimator :

© Prof. Rolf Ingold 5 Density Estimation (2)  When using respectively 1,2,...n samples, let us consider a sequence of regions around x denoted R 1, R 2,..., R n  let V n be the volume of R n  let k n be the number of samples falling in R n  Then it can be shown that the sequence p 1 (x), p 2 (x),..., p n (x) is converging to p(x) if the following conditions are all satisfied

© Prof. Rolf Ingold 6 Two different approaches  Two approaches satisfy these conditions  Parzen windows, defining the regions by their volumes  k-nearest-neighbor rule (kNN), defining the regions by the number of samples falling in them

© Prof. Rolf Ingold 7 Principle of Parzen Windows  Each sample of the training set contributes to the estimated density by contributing to it with a window function  the width of the window must be chosen carefully  if the window width is too large, the decision boundaries have too less resolution  if the window width is too small, there is a risk of overfitting

© Prof. Rolf Ingold 8 Decision boundaries for different Parzen window widths  In fact the window width should be adapted locally

© Prof. Rolf Ingold 9 k-nearest-neighbor approach  The k-nearest-neighbor approach avoids the problem of Parzen windows:  the "window width" is automatically adapted to the local density, i.e. to the k closest samples

© Prof. Rolf Ingold 10 Density functions for k-nearest-neighbors  The density functions are continuous, bat not their derivative ! Illustration of density functions for k = 3 and k = 5

© Prof. Rolf Ingold 11 Estimation of a posteriori probabilities  Lets consider a region centered at x having a volume V and containing exactly k samples from the training set,  k i of them are supposed to belong to class  i  The joint probability of x and  i is  The estimated a posteriori probabilities are  This justifies the rule of choosing the class  i corresponding to the highest value for k i

© Prof. Rolf Ingold 12 Choice of k for the k-nearest neighbor rule  The parameter k is chosen as a function of n  by choosing  we get  showing that V 0 is depending on p n (x)

© Prof. Rolf Ingold 13 Nearest Neighbor rule  The nearest neighbor rule is a suboptimal rule that is classifying a sample x to the class of the nearest neighbor  It can be shown that the probability of error P of the nearest neighbor rule is bounded by where P* represents the Bayes error

© Prof. Rolf Ingold 14 Generalization to the kNN rule  The error rate of the KNN rule is plotted in the graphic below for the two category case  it shows that asymptotically (when k→∞ ) the error rate converge to the Bayes error

© Prof. Rolf Ingold 15 Distances  The k-nearest neighbor relies on a distance (or metric)  Algebraically, a distance must satisfy four properties  non-negativity : d(a,b) ≥ 0  reflexivity : d(a,b) = 0 if and only if a=b  symmetry : d(a,b) = d(b,a)  triangle inequality : d(a,b) + d(b,c) ≥ d(a,c)

© Prof. Rolf Ingold 16 Problem with distances  Scaling the coordinates of a feature space can change the relationship induced by the distance  To avoid arbitrary scaling, it is recommended to perform feature normalization, i.e. determining the scale accordingly to  min-max interval of each feature  standard deviation of individual feature distribution

© Prof. Rolf Ingold 17 Generalized distances  The Minkovski distance generalizing the Euclidian distance is defined by it leads to the following special cases  the Euclidian distance (for k=2 )  the Manhattan distance or city block distance (for k=1 )  the maximum distance (for k=∞ )  Many other distances do exist