Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.

Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning

1-NN save all training data to classify a test example, –compute distance to each training example –Euclidean distance metric –report same class of nearest training example –for binary attributes, use Hamming distance –for nominal attributes, use equality (0 if equal, else 1) or VDM (Value-Difference Metric; Stanfill and Waltz, 1986) – difference of conditional probabilities squared, summed over classes Result: often surprisingly good accuracy, comparable with decision trees & neural nets

k-NN –sensitivity to noise –take majority over k closest neighbors –optimizing k: use validation set distance-weighting –can use all training examples

strengths of k-NN –simple, accurate –Theorem: In the limit (large N), the error of 1-NN is at most twice the error of the Bayes-optimal classifier (Cover & Hart, 1967) weaknesses of k-NN –memory needed to store examples –classification speed (indexing can help) –no comprehensibility –(noise, curse of dimensionality, lack of adequate training examples) basis for generalization –bias: similarity bias

NTGrowth (Aha and Kibler) during training, save only those examples on which mistakes are made –also throw out examples that appear noisy reduces memory requirements, increases accuracy

Scaling of attributes –for fairness, don’t want large values to dominate –pre-whiten data: for continuous values, replace with z-scores, z=(x-  binary and nominal attributes are already on scale of 0-1

Feature Weighting weighted Euclidean dist. metric –want to weight features by “relevance” –conditional probability –negEntropy –chi-squared Mahalanobis metric –inverse of covariance matrix, d xy =(x-y) T  -1 (x-y) –capture skewing of data distribution –con: class-independent

Feature Selection curse of dimensionality – many attributes often leads to lower accuracy PCA – principle component analysis –based on manipulation of covariance matrix –choose new orthogonal dimensions based on linear combinations of original attributes, chosen in order of most variance explained filter methods: try to estimate relevance –negEntropy, RELIEF: hits vs. misses of neighbors wrapper methods (use accuracy on training data to pick best features) –SFS: stepwise-forward selection –SBE: stepwise-backward elimination –DIET: try optimizing weights of one feature at a time by searching a grid

Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.

Similar presentations

Presentation on theme: "Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.

Similar presentations

Presentation on theme: "Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning."— Presentation transcript:

Similar presentations

About project

Feedback