Lecture 05: K-nearest neighbors

Slides:

Advertisements

Similar presentations

Principles of Density Estimation

Advertisements

Data Mining Classification: Alternative Techniques

Indian Statistical Institute Kolkata

Lazy vs. Eager Learning Lazy vs. eager learning

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.

1-NN Rule: Given an unknown sample X decide if for That is, assign X to category if the closest neighbor of X is from category i.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.

Aprendizagem baseada em instâncias (K vizinhos mais próximos)

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

Nearest Neighbour Condensing and Editing David Claus February 27, 2004 Computer Vision Reading Group Oxford.

INSTANCE-BASE LEARNING

CS Instance Based Learning1 Instance Based Learning.

Nearest-Neighbor Classifiers Sec minutes of math... Definition: a metric function is a function that obeys the following properties: Identity:

Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.

Non-Parameter Estimation 主講人：虞台文. Contents Introduction Parzen Windows k n -Nearest-Neighbor Estimation Classification Techiques – The Nearest-Neighbor.

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

David Claus and Christoph F. Eick: Nearest Neighbor Editing and Condensing Techniques Nearest Neighbor Editing and Condensing Techniques 1.Nearest Neighbor.

Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.

CS Machine Learning Instance Based Learning (Adapted from various sources)

Mete Ozay, Fatos T. Yarman Vural —Presented by Tianxiao Jiang

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.

KNN & Naïve Bayes Hongning Wang

Linear Discriminant Functions Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.

Lecture 15. Pattern Classification (I): Statistical Formulation

k-Nearest neighbors and decision tree

Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17

Non-Parameter Estimation

Lecture 04: Logistic Regression

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

Lecture 07: Soft-margin SVM

INTRODUCTION TO Machine Learning 3rd Edition

Lecture 02: Linear Regression

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Classification Nearest Neighbor

Lecture 09: Gaussian Processes

Lecture 04: Logistic Regression

Instance Based Learning (Adapted from various sources)

K Nearest Neighbor Classification

Classification Nearest Neighbor

Nearest-Neighbor Classifiers

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

Lecture 07: Soft-margin SVM

Instance Based Learning

CS489/698: Intro to ML Lecture 08: Kernels 10/12/17 Yao-Liang Yu.

COSC 4335: Other Classification Techniques

Lecture 08: Soft-margin SVM

Lecture 07: Soft-margin SVM

Lecture 02: Linear Regression

Lecture 04: Logistic Regression

CS480/680: Intro to ML Lecture 01: Perceptron 9/11/18 Yao-Liang Yu.

Advanced Artificial Intelligence Classification

CS480/680: Intro to ML Lecture 09: Kernels 23/02/2019 Yao-Liang Yu.

Generally Discriminant Analysis

Lecture 10: Gaussian Processes

Nearest Neighbors CSC 576: Data Mining.

Lecture 03: K-nearest neighbors

CSE4334/5334 Data Mining Lecture 7: Classification (4)

Lecture 8. Learning (V): Perceptron Learning

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.

ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.

Presentation transcript:

Lecture 05: K-nearest neighbors CS489/698: Intro to ML Lecture 05: K-nearest neighbors 9/26/17 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 9/26/17 Yao-Liang Yu

Announcements Assignment 1 extension to Thursday? 9/26/17 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 9/26/17 Yao-Liang Yu

Classification revisited ŷ = sign( xTw + b ) Decision boundary: xTw + b = 0 Parametric: finite-dim w Non-parametric: no specific form (or inf-dim w) 9/26/17 Yao-Liang Yu

1-Nearest Neighbour Store training set (X, y) For query (test point) x’ find nearest point x in X predict y’ = y(x) 9/26/17 Yao-Liang Yu

What do you mean “nearest” Need to measure distance or similarity d: X x X  R+ such that symmetry: d(x, x’) = d(x, x’) definite: d(x, x’) = 0 iff x = x’ triangle inequality: d(a, b) <= d(a,c) + d(c,b) Lp distance: dp(x, x’) = || x – x’ ||p p=2: Euclidean distance p=1: Manhattan distance p=inf: Chebyshev distance 9/26/17 Yao-Liang Yu

Complexity of 1-NN Training: 0… but O(nd) space Testing: O(nd) for each query point n: # of training samples d: # of features Can we do better? 9/26/17 Yao-Liang Yu

Voronoi diagram In 2D, can construct in O(n logn) time and O(n) space (Fortune, 1989) In 2D, query costs O(log n) Large d? nO(d) space with O(d log n) query time  approximate NN 9/26/17 Yao-Liang Yu

Normalization Usually, for each feature, subtract the mean and divide by standard deviation Or, equivalently, use a different distance metric 9/26/17 Yao-Liang Yu

Learning the metric Mahalanobias distance Or equivalently let First perform linear transformation Then use the usual Euclidean distance is small iff such that 9/26/17 Yao-Liang Yu

k-NN Store training set (X, y) For query (test point) x’ find k nearest points x1, x2, … , xk in X predict y’ = y(x1, x2, … , xk) usually a majority vote among the labels of x1, x2, … , xk say y1=1, y2=1, y3=-1, y4=1, y5=-1  y’= Test complexity: O(nd) 9/26/17 Yao-Liang Yu

Effect of k 9/26/17 Yao-Liang Yu

How to select k? 9/26/17 Yao-Liang Yu

Does k-NN work? MNIST: 60k train, 10k test 9/26/17 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 9/26/17 Yao-Liang Yu

Bayes rule Bayes error Bayes rule f*(X) = 1 iff η(X) ≥ ½ 9/26/17 Yao-Liang Yu

Derivation 9/26/17 Yao-Liang Yu

Multi-class This is the best we can do even when we know the distribution of (X,Y) How big can P* be? 9/26/17 Yao-Liang Yu

At most twice worse (Cover & Hart’67) limninf P(Y1(n)≠Y) ≤ 2P* - c/(c-1) (P*)2 Asymptotically! # of classes 1 / (max P*) 1-NN error Bayes error If P* close to 0, then 1-NN error close to 2P* How big does n have to be? 9/26/17 Yao-Liang Yu

The picture Both assume we have infinite amount of training data! 9/26/17 Yao-Liang Yu

1-NN vs k-NN error(1NN) = 1/2n error(kNN) for k = 2t+1: 9/26/17 Yao-Liang Yu

Curse of dimensionality Theorem (SSBD, p224). For any c > 1 and any learning algorithm L, there exists a distribution over [0,1]d x {0,1} such that the Bayes error is 0 but for sample size n ≤ (c+1)d/2 , the error of the rule L is greater than ¼. k-NN is effective when have many training samples Dimensionality reduction may be helpful 9/26/17 Yao-Liang Yu

Outline Announcements Algorithm Theory Application 9/26/17 Yao-Liang Yu

Locally linear embedding (Saul & Roweis’00) 9/26/17 Yao-Liang Yu

k-NN for regression Training: store training set (X, y) Query x’ Find k-nns x1, x2, … , xk in X output y’ = y(x1, x2, … , xk) = (y1+y2+…+yk)/k 9/26/17 Yao-Liang Yu

Questions? 9/26/17 Yao-Liang Yu