COMP 2208 Dr. Long Tran-Thanh University of Southampton K-Nearest Neighbour.

Slides:



Advertisements
Similar presentations
Computational Learning An intuitive approach. Human Learning Objects in world –Learning by exploration and who knows? Language –informal training, inputs.
Advertisements

K-means method for Signal Compression: Vector Quantization
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Indian Statistical Institute Kolkata
Instance Based Learning
Lazy vs. Eager Learning Lazy vs. eager learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Classification and Decision Boundaries
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 7 – K-Nearest-Neighbor
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
More Methodology; Nearest-Neighbor Classifiers Sec 4.7.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Classification Dr Eamonn Keogh Computer Science & Engineering Department University of California - Riverside Riverside,CA Who.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Machine Learning Group University College Dublin Nearest Neighbour Classifiers Lazy v’s Eager k-NN Condensed NN.
Instance Based Learning. Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return the answer associated.
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
Three kinds of learning
Data Mining Classification: Alternative Techniques
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
1 Nearest Neighbor Learning Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AI.
Unsupervised Learning and Data Mining
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
CS Instance Based Learning1 Instance Based Learning.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Rugby players, Ballet dancers, and the Nearest Neighbour Classifier COMP24111 lecture 2.
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Issues with Data Mining
Data mining and machine learning A brief introduction.
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
K Nearest Neighborhood (KNNs)
Multi-Layer Perceptrons Michael J. Watts
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
K Nearest Neighbors Saed Sayad 1www.ismartsoft.com.
The Broad Institute of MIT and Harvard Classification / Prediction.
Learning from Observations Chapter 18 Through
1 Instance Based Learning Ata Kaban The University of Birmingham.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
CpSc 810: Machine Learning Instance Based Learning.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
COMP 2208 Dr. Long Tran-Thanh University of Southampton Decision Trees.
Validation methods.
CS Machine Learning Instance Based Learning (Adapted from various sources)
Classification Algorithms Covering, Nearest-Neighbour.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Recommender Systems 11/04/2017
Data Science Algorithms: The Basic Methods
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Instance Based Learning
Classification Algorithms
Junheng, Shengming, Yunsheng 11/2/2018
Nearest Neighbors CSC 576: Data Mining.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

COMP 2208 Dr. Long Tran-Thanh University of Southampton K-Nearest Neighbour

Classification Environment Perception Behaviour Categorize inputs Update belief model Update decision making policy Decision making Perception Behaviour

Generalisation X1, Y1 X2,Y2 X3,Y3 Xn, Yn (X,?) Training data:Unseen data: Our goal: good performance on both training data and never-seen-before data

Overfitting Overfitting: High accuracy on training data, but low quality in prediction Incorrect predictions Training data Testing data Correct predictions

Training vs. testing How do we know what is the generalisation power of the model? Does it efficiently predict? How to minimise generalisation error? Idea: use some data as training data, keep the others for testing Advantage: can measure generalisation power Issue: waste data for testing (cannot be used for training) – big issue when dataset is small

Training vs. testing Idea 2: why not swap the data sets, and rerun the algorithm? Objective: minimise the average generalisation error (+ training error) K-fold cross validation: K disjunct (i.e., non-overlapping) partitions of data points Use (K-1) for training, the K-th one for testing Repeat this K times (with each partition = testing data once)

Lazy learning So far: we train the system with training data, then test it on other data Lazy learning: there is no training phase We evaluate the data point at the time it is chosen (similar to online learning) We compare the new data point with existing ones in the system We fix the parameters of the system after training Typically no (global) parameters to be set (not always correct) A lazy learning algorithm: k-nearest neighbour

The intuition Humans usually categorise and generalise based on how similar a new object is, compared to other known things: e.g., dog, cat, desk, … But this is not always obvious:

The main challenge We need to be able to identify the degree of similarity Idea: define some geometric representation of the data points Degree of similarity = (geometric) distance between the points How to define the metrics?

A geographic example

In many cases, we only have some local information ? Idea: physically close locations are likely to be similar

Nearest neighbour We classify based on the nearest known data point Intuition: the most similar point is the most dominant one Hungarian proverb: Watch her mother before you marry a girl

Voronoi diagram Partition the space into sub-spaces Partition is based on distance We use this partitioning to classify

Another geographic example The towns of Baarle-Hertog (Belgium) and Baarle-Nassau (The Netherlands) have a very complicated border

Another geographic example

People’s nationality: blue = Dutch, red = Belgian

When nearest neighbour is wrong

K-nearest neighbour We choose the K closest known neighbours Use majority voting to determine should be In our second example: K = 5 Choose 5 nearest locations If 3 out of 5 is Belgian -> our prediction is also Belgian (and Dutch otherwise)

K-nearest neighbour with K = 5

KNN: Questions How large K should be? What distance metrics should be used? Anything other than majority voting?

Setting the value of K Unfortunately there's no general way to go. If k is too small, you over-fit to noise in the data (i.e., output is noisy). If k is too big, you lose all the structural detail (e.g., if k=N you predict all-Netherlands). In practice you try multiple values. A possible heuristic: Run kNN with multiple K-s at the same time Use cross validation to identify the best value of K

Distance metrics Geographical based problems: Euclidean distance Distance = physical distance between 2 locations In some cases, (Euclidean) distance is not obvious Data points have multiple dimensions Data point = sales person, who has age + sold items Age is between 0 and 100 (unit = 1), sold items is between 0 and 1 million (unit = 1000) If we put this into the Euclidean space, the latter dominates the former

Distance metrics Idea: we need to normalise the data when we put it into the Euclidean space Normalisation = rescale the data so all the dimensions are comparable E.g., make all the dimensions between 0 and 1 What to do when the data is categorical? (e.g., “Good”, “OK”, “Terrible”) Practical consensus: we don’t use kNN for these cases Categorical data: describes membership of a group. The groups are distinct, and may be represented with a code number, but they can't be ranked. Examples: country, sex, species, behaviour. kNN is typically good for using continuous data (e.g., location coordinates) to predict categorical (e.g., country) or continuous outcomes (e.g., home property price)

Other distance metrics? Data points = distributions of real data (e.g., data stream) Probability-distance metrics: Hellinger distance, Kolmogorov distance, etc… Other non-Euclidean distances: L1 (taxicab) distance

Anything other than majority voting? Choose the median Weighted majority voting How to set the weights? Prefer the closer ones, or the more distant ones? Weight by their importance Many other exotic voting rules (see social choices theory) Choose the average

Summary: kNN KNN is a simple example of lazy learning (instance-based, memory-based). KNN doesn't require a training stage. We use the training data itself to classify Things to be considered: A distance metric. How many neighbours do we look at? A method for predicting the output based on the neighbouring local points.