KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
K-means method for Signal Compression: Vector Quantization
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Model generalization Test error Bias, variance and complexity
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Instance Based Learning
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Instance Based Learning
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
CS Instance Based Learning1 Instance Based Learning.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Module 04: Algorithms Topic 07: Instance-Based Learning
PATTERN RECOGNITION AND MACHINE LEARNING
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Data Reduction. 1.Overview 2.The Curse of Dimensionality 3.Data Sampling 4.Binning and Reduction of Cardinality.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
Naïve Bayes Classifier. Red = Yellow = Mass = Volume = Apple Sensors, scales, etc… 8/29/03Bayesian Classifier2.
1 Instance Based Learning Ata Kaban The University of Birmingham.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
CpSc 881: Machine Learning Instance Based Learning.
CpSc 810: Machine Learning Instance Based Learning.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
K-Nearest Neighbor Learning.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
1 Instance Based Learning Soongsil University Intelligent Systems Lab.
General-Purpose Learning Machine
Data Science Algorithms: The Basic Methods
Instance Based Learning
Classification Nearest Neighbor
K Nearest Neighbor Classification
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Instance Based Learning
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Chap 8. Instance Based Learning
Chapter 8: Generalization and Function Approximation
Machine Learning: UNIT-4 CHAPTER-1
Machine learning overview
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

KNN Classifier

 Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make that the prediction 8/29/032Instance Based Classification

 Assign the most common class among the K-nearest neighbors (like a vote) 8/29/033Instance Based Classification

8/29/034Instance Based Classification

 Train  Load training data  Classify  Read in instance  Find K-nearest neighbors in the training data  Assign the most common class among the K-nearest neighbors (like a vote) 8/29/035 Euclidean distance: a is an attribute (dimension) Instance Based Classification

 Naïve approach: exhaustive  For the instance to be classified  Visit every training sample and calculate distance  Sort  First K in the list 8/29/036 Euclidean distance: a is an attribute (dimension) Instance Based Classification

 The Work that Must be Performed  Visit every training sample and calculate distance  Sort  Lots of floating point calculations  Classifier puts-off work till time to classify 8/29/037Instance Based Classification Euclidean distance: a is an attribute (dimension)

 This is known as a “lazy” learning method  If do most of the work during the training stage known as “eager”  Our next classifier, Naïve Bayes, will be eager  Training takes a while but can classify fast  Which do you think is better? 8/29/038Instance Based Classification Where the work happens

From Wikipedia : space ‑ partitioning data structure for organizing points in a k ‑ dimensional space. kd ‑ trees are a useful data structure for several applications, such as searches involving a multidimensional search key (e.g. range searches and nearest neighbor searches). kd-trees are a special case of BSP trees. 8/29/039Instance Based Classification

 Speeds up classification  Probably slows “training” 8/29/03Instance Based Classification10

 Choosing K can be a bit of an art  What if you could include all data-points (K=n)?  How might you do such a thing? 8/29/0311Instance Based Classification How include all data points? What if weighted the votes of each training sample by its distance from the point being classified?

 1 over distance squared  Could get less fancy and go linear  But then training data very-far-away still have strong influence 8/29/03Instance Based Classification12

 Other Radial Basis Functions  Sometimes known as a Kernel Function  One of the more common 8/29/03Instance Based Classification13

 Work back-loaded  Worse the bigger the training data  Can alleviate with data structures  What else? 8/29/03Instance Based Classification14 Other Issues? What if only some dimensions contribute to ability to classify? Differences in other dimensions would put distance between that point and the target.

 Book calls this the curse of dimensionality  More is not always better  Might be identical in important dimensions but distant in others 8/29/0315Instance Based Classification From Wikipedia: In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10- dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.) From Wikipedia: In applied mathematics, curse of dimensionality (a term coined by Richard E. Bellman),[1][2] also known as the Hughes effect[3] or Hughes phenomenon[4] (named after Gordon F. Hughes),[5][6] refers to the problem caused by the exponential increase in volume associated with adding extra dimensions to a mathematical space. For example, 100 evenly-spaced sample points suffice to sample a unit interval with no more than 0.01 distance between points; an equivalent sampling of a 10- dimensional unit hypercube with a lattice with a spacing of 0.01 between adjacent points would require 1020 sample points: thus, in some sense, the 10-dimensional hypercube can be said to be a factor of 1018 "larger" than the unit interval. (Adapted from an example by R. E. Bellman; see below.)

 Thousands of genes  Relatively few patients  Is there a curse? 8/29/03Instance Based Classification16 gene patient g1g1 g2g2 g3g3 …gngn disease p1p1 x 1,1 x 1,2 x 1,3 …x 1,n Y p2p2 x 2,1 x 2,2 x 2,3 …x 2,n N pmpm x m,1 x m,2 x m,3 …x m,n ?

 Bayesian could  Think of discrete data as being pre-binned  Remember RNA classification  Data in each dimension was A, C, U, or G 8/29/03Instance Based Classification17 How measure distance? A might be closer to G than C or U (A and G are both purines while C and U are pyrimidines). Dimensional distance becomes domain specific. Representation becomes all important If could arrange appropriately could use techniques like Hamming distances Representation becomes all important If could arrange appropriately could use techniques like Hamming distances

RednessYellownessMassVolumeClass apple lemon orange peach orange apple 8/29/03Instance Based Classification18  First few records in the training data  See any issues?  Hint: think of how Euclidean distance is calculated Should really normalize the data

 Function approximation  Real valued prediction: take average of nearest k neighbors  If don’t know the function and/or it is too complex to “learn”, just plug-in a new value the KNN classifier can “learn” the predicted value on the fly by averaging the nearest neighbors 8/29/0319Instance Based Classification Why average?

 Choose an m and b that minimizes the squared error  But again, computationally How? 8/29/03Instance Based Classification20

 If want to learn an instantaneous slope  Can do local regression  Get the slope of a line that fits just the local data 8/29/03Instance Based Classification21

 For each of the training datum we know what Y should be  If we have a randomly generated m and b, these, along with X will tell us a predicted Y  Know whether the m and b yield too large or too small a prediction  Can nudge “m” and “b” in an appropriate direction (+ or -)  Sum these proposed nudges across all training data 8/29/03Instance Based Classification22 Target Y too low Line represents output or predicted Y

 Which way should m go to reduce error? 8/29/03Instance Based Classification23 y actual Rise b

 Locally weighted linear regression  Would still perform gradient descent  Becomes a global function approximation 8/29/03Instance Based Classification24

 KNN highly effective for many practical problems  With sufficient training data  Robust to noisy training  Work back-loaded  Susceptible to dimensionality curse 8/29/03Instance Based Classification25

8/29/0326Instance Based Classification