Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Histograms-Parzen Windows Main idea: Instead of selecting a parametric distribution (e.g., Gaussian) to describe the properties of the features of a class, compute directly the empirical distribution class feature histogram
Feature Histogram Example X # of samples in each bin Normalize histogram curve to get feature PDF
Parzen Windows: Issues When compared to parametric methods empirical distributions are: Better because no specific form of the PDF is assumed Worse because over-fitting can easily occur (too small histogram bin) Parzen proposed rules for adapting bin size based on number of samples in each bin to avoid over- fitting
Nearest Neighbor Rule Main idea (1-NNR): No explicit model (i.e., no training) For each test sample x the “nearest” sample x’ in the training set is found, i.e., argmin x’ d(x, x’) and x is classified to the class where x’ belongs
Generalizations k-NNR: Instead of finding the nearest neighbors we find k nearest neighbors from the training set; the sample x is classified to the class where most of the k neighbors belong k-l-NNR: Like k-NNR but at least l of the k nearest neighbor must belong to the same class for a classification decision to be taken (else no decision)
Example Training set D 1 = {0,-1,-2} and D 2 = {1,1,1} NNR decision boundary 3-NNR decision boundary 3-3-NNR no decision region
Computational Efficiency To speed up NNR classification the training set size can be reduced using the condensing algorithm: The training set is classified using NNR rule misclassified samples are added to the new (condensed) training set one by one until all training samples are correctly classified
Conclusions Non parametric classification algorithms are easy to implement are computationally efficient (in training) don’t make any assumptions are prone to over-fitting are hard to adapt (no detailed model)