CpSc 810: Machine Learning Instance Based Learning.

Slides:



Advertisements
Similar presentations
1 Classification using instance-based learning. 3 March, 2000Advanced Knowledge Management2 Introduction (lazy vs. eager learning) Notion of similarity.
Advertisements

Machine Learning Instance Based Learning & Case Based Reasoning Exercise Solutions.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
K-means method for Signal Compression: Vector Quantization
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Instance Based Learning
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Lazy vs. Eager Learning Lazy vs. eager learning
1er. Escuela Red ProTIC - Tandil, de Abril, Instance-Based Learning 4.1 Introduction Instance-Based Learning: Local approximation to the.
Classification and Decision Boundaries
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Instance Based Learning
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
Università di Milano-Bicocca Laurea Magistrale in Informatica Corso di APPRENDIMENTO E APPROSSIMAZIONE Lezione 8 - Instance based learning Prof. Giancarlo.
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
Instance Based Learning
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Instance-Based Learning
Lazy Learning k-Nearest Neighbour Motivation: availability of large amounts of processing power improves our ability to tune k-NN classifiers.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
Aprendizagem baseada em instâncias (K vizinhos mais próximos)
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Instance Based Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán) 1.
INSTANCE-BASE LEARNING
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
Nearest Neighbor Classifiers other names: –instance-based learning –case-based learning (CBL) –non-parametric learning –model-free learning.
CS Instance Based Learning1 Instance Based Learning.
Module 04: Algorithms Topic 07: Instance-Based Learning
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Basic Data Mining Technique
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
 2003, G.Tecuci, Learning Agents Laboratory 1 Learning Agents Laboratory Computer Science Department George Mason University Prof. Gheorghe Tecuci 9 Instance-Based.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Statistical Inference (By Michael Jordon) l Bayesian perspective –conditional perspective—inferences.
1 Instance Based Learning Ata Kaban The University of Birmingham.
CpSc 881: Machine Learning Instance Based Learning.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Outline K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Lazy Learners K-Nearest Neighbor algorithm Fuzzy Set theory Classifier Accuracy Measures.
Kansas State University Department of Computing and Information Sciences CIS 798: Intelligent Systems and Machine Learning Tuesday, November 23, 1999.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
Meta-learning for Algorithm Recommendation Meta-learning for Algorithm Recommendation Background on Local Learning Background on Algorithm Assessment Algorithm.
CS Machine Learning Instance Based Learning (Adapted from various sources)
K-Nearest Neighbor Learning.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Kansas State University Department of Computing and Information Sciences CIS 890: Special Topics in Intelligent Systems Wednesday, November 15, 2000 Cecil.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.
1 Instance Based Learning Soongsil University Intelligent Systems Lab.
KNN & Naïve Bayes Hongning Wang
1 Instance Based Learning Soongsil University Intelligent Systems Lab.
Instance Based Learning
Data Mining: Concepts and Techniques (3rd ed
K-Nearest Neighbours and Instance based learning
Instance Based Learning (Adapted from various sources)
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
یادگیری بر پایه نمونه Instance Based Learning Instructor : Saeed Shiry
Instance Based Learning
COSC 4335: Other Classification Techniques
Chap 8. Instance Based Learning
Machine Learning: UNIT-4 CHAPTER-1
Nearest Neighbor Classifiers
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Presentation transcript:

CpSc 810: Machine Learning Instance Based Learning

2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!

3 Instance Based Learning (IBL) IBL methods learn by simply storing the presented training data. When a new query instance is encountered, a set of similar related instances is retrieved from memory and used to classify the new query instance. IBL approaches can construct a different approximation to the target function for each distinct query. They can construct local rather than global approximations. IBL methods can use complex symbolic representations for instances. This is called Case-Based Reasoning (CBR).

4 Advantages and Disadvantages of IBL Methods Advantage: IBL Methods are particularly well suited to problems in which the target function is very complex, but can still be described by a collection of less complex local approximations. Disadvantage I: The cost of classifying new instances can be high (since most of the computation takes place at this stage). Disadvantage II: Many IBL approaches typically consider all attributes of the instances ==> they are very sensitive to the curse of dimensionality!

5 Instance Based Learning Nearest Neighbor: Given query instance x q, first locate nearest training example x n, then estimate f(x q )<-f(x n ) K-Nearest Neighbor: Given query instance x q, take vote among its k nearest neighbors, if discrete- valued target function Take mean of f values of k nearest neighbors, if real valued

6 k-Nearest Neighbor Learning in Euclidean Space Assumption: All instances, x, correspond to points in the n-dimensional space R n. x =. Measure Used: Euclidean Distance: d(x i,x j )=  r=1 n (a r (x i )-a r (x j )) 2 Training Algorithm: For each training example, add the example to the list training_examples. Classification Algorithm: Given a query instance x q to be classified: Let x 1 …x k be the k instances from training_examples that are nearest to x q. Return f ^ (x q ) <- argmax v  V  r=1 n  (v,f(x i )) where  (a,b)=1 if a=b and  (a,b)=0 otherwise.

7 Voronoi Diagram : query, x q 1-NN: + 5-NN: - Decision Surface for 1-NN

8 Behavior in the Limit Consider p(x) defines probability that instance x will be labeled 1 (positive) versus 0 (negative) Nearest neighbor: As number of training examples -> ∞, approaches Gibbs Algorithm Gibbs: with probability p(x) predict 1, else 0 K nearest neighbor: As number of training examples -> ∞ and k get large, approaches Bayes optimal Bayes optimal: if p(x)>0.5 then predict 1, else 0 Note Gibbs has at most twice the expected error of Bayes optimal.

9 Distance-Weighted Nearest Neighbors k-NN can be refined by weighting the contribution of the k neighbors according to their distance to the query point x q, giving greater weight to closer neighbors. To do so, replace the last line of the algorithm with f ^ (x q ) <- argmax v  V  r=1 n w i  (v,f(x i )) where w i =1/d(x q,x i ) 2

10 Remarks on kNN Advantages: the NN algorithm can estimate complex target concepts locally and differently for each new instance to be classified; the NN algorithm provides good generalisation accuracy on many domains; the NN algorithm learns very quickly; the NN algorithm is robust to noisy training data; the NN algorithm is intuitive and easy to understand which facilitates implementation and modification.Disadvantages: the NN algorithm has large storage requirements because it has to store all the data; the NN algorithm is slow during instance classification because all the training instances have to be visited; the accuracy of the NN algorithm degrades with increase of noise in the training data; the accuracy of the NN algorithm degrades with increase of irrelevant attributes. Efficient memory indexing of the training instances was proposed to speed up instance classification. The most popular indexing technique is based on multidimensional trees

11 Curse of Dimensionality Inductive Bias of k-nearest neighbor Assumption that the classification of an instance x q will be most similar to the classification of other instance that are nearby in Euclidean distance. Curse of dimensionality: nearest neighbor is easily mislead while high-dimensional X. The distance is calculated based on all attributes of the instance. Image instances described by 20 attributes, but only two are relevant to target function. Solution: weigh the attributes differently (use cross-validation to determine the weights) eliminate the least relevant attributes (again, use cross-validation to determine which attributes to eliminate)

12 Locally Weighted Regression kNN forms local approximation to f for each query point x q, Why not form an explicit approximation f ^ (x) for region surrounding x q Locally weighted regression generalizes nearest-neighbour approaches by constructing an explicit approximation to f over a local region surrounding x q. In such approaches, the contribution of each training example is weighted by its distance to the query point.

13 An Example: Locally Weighted Linear Regression f is approximated by: f ^ (x)=w 0 +w 1 a 1 (x)+…+w n a n (x) Gradient descent can be used to find the coefficients w 0, w 1,…w n that minimize some error function. The error function, however, should be different from the one used in the Neural Net since we want a local solution. Different possibilities: Minimize the squared error over just the k nearest neighbors. Minimize the squared error over the entire training set but weigh the contribution of each example by some decreasing function K of its distance from x q. Combine 1 and 2

14 Radial Basis Function (RBF) Approximating Function: f ^ (x)=w 0 +  u=1 k w u K u (d(x u,x)) K u (d(x u,x)) is a kernel function that decreases as the distance d(x u,x) increases (e.g., the Gaussian function); and k is a user-defined constant that specifies the number of kernel functions to be included. Although f ^ (x) is a global approximation to f(x) the contribution of each kernel function is localized. RBF can be implemented in a neural network. It is a very efficient two step algorithm: Find the parameters of the kernel functions (e.g., use the EM algorithm) Learn the linear weights of the kernel functions.

15 Case-Based Reasoning (CBR) CBR is similar to k-NN methods in that: They are lazy learning methods in that they defer generalization until a query comes around. They classify new query instances by analyzing similar instances while ignoring instances that are very different from the query. However, CBR is different from k-NN methods in that: They do not represent instances as real-valued points, but instead, they use a rich symbolic representation. CBR can thus be applied to complex conceptual problems such as the design of mechanical devices or legal reasoning Application of CBR: Design: landscape, building, mechanical, conceptual design of aircraft sub-systems Planning:repair schedules Diagnosis: medical Adversarial reasoning:legal

16 Case-Based Reasoning (CBR) Methodology Instances represented by rich symbolic descriptions (e.g., function graphs) Search for similar cases, multiple retrieved cases may be combined Tight coupling between case retrieval, knowledge-based reasoning, and problem solving Challenges Find a good similarity metric Indexing based on syntactic similarity measure, and when failure, backtracking, and adapting to additional cases

17 CBR process New Case matching Matched Cases Retrieve Adapt? No Yes Closest Case Suggest solution Retain Learn Revise Reuse Case Base Knowledge and Adaptation rules

18 CBR example: Property pricing Test instance

19 How rules are generated There is no unique way of doing it. Here is one possibility: Examine cases and look for ones that are almost identical case 1 and case 2 R1: If recep-rooms changes from 2 to 1 then reduce price by £5,000 case 3 and case 4 R2: If Type changes from semi to terraced then reduce price by £7,000

20 Matching Comparing test instance matches(5,1) = 3 matches(5,2) = 3 matches(5,3) = 2 matches(5,4) = 1 Estimate price of case 5 is £25,000

21 Adapting Reverse rule 2 if type changes from terraced to semi then increase price by £7,000 Apply reversed rule 2 new estimate of price of property 5 is £32,000

22 Learning So far we have a new case and an estimated price nothing is added yet to the case base If later we find house sold for £35,000 then the case would be added could add a new rule if location changes from 8 to 7 increase price by £3,000

23 December 18, 2015 About CBR Problems with CBR How should cases be represented? How should cases be indexed for fast retrieval? How can good adaptation heuristics be developed? When should old cases be removed? Advantages A local approximation is found for each test case Knowledge is in a form understandable to human beings Fast to train

24 Lazy vs. Eager Learning Eager Learning Learning = acquiring explicit description of the target concepts on the whole training set; Classification = an instance gets a classification using the explicit description of the target concepts. Instance-Based Learning (Lazy Learning) Learning = storing all training instances Classification = an instance gets a classification equal to the classification of the nearest instances to the instance. Accuracy Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function Eager: must commit to a single hypothesis that covers the entire instance space