ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

Slides:



Advertisements
Similar presentations
Nonparametric Methods: Nearest Neighbors
Advertisements

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Regression –Model selection, Cross-validation –Error decomposition.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Crash Course on Machine Learning Part III
An Introduction of Support Vector Machine
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Instance Based Learning
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Learning from Observations Chapter 18 Section 1 – 4.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
INSTANCE-BASE LEARNING
ECE 5984: Introduction to Machine Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Module 04: Algorithms Topic 07: Instance-Based Learning
ECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Today’s Topics HW0 due 11:55pm tonight and no later than next Tuesday HW1 out on class home page; discussion page in MoodleHW1discussion page Please do.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Statistical Estimation (MLE, MAP, Bayesian) Readings: Barber 8.6, 8.7.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
1 Instance Based Learning Ata Kaban The University of Birmingham.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Unsupervised Learning: Kmeans, GMM, EM Readings: Barber
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Supervised Learning –General Setup, learning from data –Nearest Neighbour.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Regression Readings: Barber 17.1, 17.2.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Ensemble Methods: Bagging, Boosting Readings: Murphy 16.4; Hastie 16.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
CS Machine Learning Instance Based Learning (Adapted from various sources)
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Expectation Maximization –Principal Component Analysis (PCA) Readings:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Model selection –Error decomposition –Bias-Variance Tradeoff –Classification:
k-Nearest neighbors and decision tree
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Ananya Das Christman CS311 Fall 2016
Dhruv Batra Georgia Tech
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Ch8: Nonparametric Methods
ECE 5424: Introduction to Machine Learning
Machine learning, pattern recognition and statistical data modelling
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
ECE 5424: Introduction to Machine Learning
Instance Based Learning (Adapted from various sources)
CS 2750: Machine Learning K-NN (cont’d) + Review
ECE 5424: Introduction to Machine Learning
K Nearest Neighbor Classification
Nearest Neighbors CSC 576: Data Mining.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Dhruv Batra Georgia Tech
Presentation transcript:

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –(Finish) Nearest Neighbour Readings: Barber 14 (kNN)

Administrativia HW0 –Graded. Grades available on Scholar. –Everyone passed. –Some are on the border. Please come see me. –Solutions available Wed. HW1 –Out today –Due on Sunday 02/15, 11:55pm –Please please please please please start early –Implement K-NN –Kaggle Competition –Bonus points for best performing entries. –Bonus points for beating the instructor/TA. – (C) Dhruv Batra2

Recap from last time (C) Dhruv Batra3

Nearest Neighbours (C) Dhruv Batra4Image Credit: Wikipedia

Instance/Memory-based Learning Four things make a memory based learner: A distance metric How many nearby neighbors to look at? A weighting function (optional) How to fit with the local points? Slide Credit: Carlos Guestrin(C) Dhruv Batra5

1-Nearest Neighbour Four things make a memory based learner: A distance metric –Euclidean (and others) How many nearby neighbors to look at? –1 A weighting function (optional) –unused How to fit with the local points? –Just predict the same output as the nearest neighbour. Slide Credit: Carlos Guestrin(C) Dhruv Batra6

1-NN for Regression (C) Dhruv Batra7 x y Here, this is the closest datapoint Figure Credit: Carlos Guestrin

Multivariate distance metrics Suppose the input vectors x 1, x 2, …x N are two dimensional: x 1 = ( x 11, x 12 ), x 2 = ( x 21, x 22 ), …x N = ( x N1, x N2 ). One can draw the nearest-neighbor regions in input space. Dist(x i,x j ) =(x i1 – x j1 ) 2 +(3x i2 – 3x j2 ) 2 The relative scalings in the distance metric affect region shapes Dist(x i,x j ) = (x i1 – x j1 ) 2 + (x i2 – x j2 ) 2 Slide Credit: Carlos Guestrin

Euclidean distance metric where Or equivalently, A Slide Credit: Carlos Guestrin

Scaled Euclidian (L 2 ) Mahalanobis (non-diagonal A) Notable distance metrics (and their level sets) Slide Credit: Carlos Guestrin

Minkowski distance (C) Dhruv Batra11 Image Credit: By Waldir (Based on File:MinkowskiCircles.svg) [CC BY-SA 3.0 ( via Wikimedia Commons

L 1 norm (absolute) L inf (max) norm Scaled Euclidian (L 2 ) Mahalanobis (non-diagonal A) Notable distance metrics (and their level sets) Slide Credit: Carlos Guestrin

Plan for today (Finish) Nearest Neighbour –Kernel Classification/Regression –Curse of Dimensionality (C) Dhruv Batra13

Weighted k-NNs Neighbors are not all the same

1-NN for Regression Often bumpy (overfits) (C) Dhruv Batra15Figure Credit: Andrew Moore

9-NN for Regression Often bumpy (overfits) (C) Dhruv Batra16Figure Credit: Andrew Moore

Kernel Regression/Classification Four things make a memory based learner: A distance metric –Euclidean (and others) How many nearby neighbors to look at? –All of them A weighting function (optional) –w i = exp(-d(x i, query) 2 / σ 2 ) –Nearby points to the query are weighted strongly, far points weakly. The σ parameter is the Kernel Width. Very important. How to fit with the local points? –Predict the weighted average of the outputs predict = Σw i y i / Σw i (C) Dhruv Batra17Slide Credit: Carlos Guestrin

Weighting/Kernel functions w i = exp(-d(x i, query) 2 / σ 2 ) (Our examples use Gaussian) (C) Dhruv Batra18Slide Credit: Carlos Guestrin

Effect of Kernel Width (C) Dhruv Batra19 What happens as σ  inf? What happens as σ  0? Image Credit: Ben Taskar

Problems with Instance-Based Learning Expensive –No Learning: most real work done during testing –For every test sample, must search through all dataset – very slow! –Must use tricks like approximate nearest neighbour search Doesn’t work well when large number of irrelevant features –Distances overwhelmed by noisy features Curse of Dimensionality –Distances become meaningless in high dimensions –(See proof in next lecture) (C) Dhruv Batra20

Curse of Dimensionality Consider: Sphere of radius 1 in d-dims Consider: an outer ε-shell in this sphere What is ? (C) Dhruv Batra21

Curse of Dimensionality (C) Dhruv Batra22

k-NN –Simplest learning algorithm –With sufficient data, very hard to beat “strawman” approach –Picking k? Kernel regression/classification –Set k to n (number of data points) and chose kernel width –Smoother than k-NN Problems with k-NN –Curse of dimensionality –Irrelevant features often killers for instance-based approaches –Slow NN search: Must remember (very large) dataset for prediction (C) Dhruv Batra23 What you need to know

Key Concepts (which we will meet again) –Supervised Learning –Classification/Regression –Loss Functions –Statistical Estimation –Training vs Testing Data –Hypothesis Class –Overfitting, Generalization (C) Dhruv Batra24 What you need to know