k-Nearest neighbors and decision tree

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Data Mining Lecture 9.
DECISION TREES. Decision trees  One possible representation for hypotheses.
CHAPTER 9: Decision Trees
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Introduction Training Complexity, Pruning CART vs. ID3 vs. C4.5
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Classification and Decision Boundaries
Discriminative and generative methods for bags of features
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Sparse vs. Ensemble Approaches to Supervised Learning
Three kinds of learning
Classification.
Ensemble Learning (2), Tree and Forest
Introduction to machine learning
Module 04: Algorithms Topic 07: Instance-Based Learning
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 9 – Classification and Regression Trees
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
K Nearest Neighbors Classifier & Decision Trees
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.7: Instance-Based Learning Rodney Nielsen.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
1 CSCI 3202: Introduction to AI Decision Trees Greg Grudic (Notes borrowed from Thomas G. Dietterich and Tom Mitchell) Intro AIDecision Trees.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Data Transformation: Normalization
Data Science Algorithms: The Basic Methods
IMAGE PROCESSING RECOGNITION AND CLASSIFICATION
Decision Trees (suggested time: 30 min)
Trees, bagging, boosting, and stacking
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Classification Nearest Neighbor
Supervised Learning Seminar Social Media Mining University UC3M
K Nearest Neighbor Classification
Learning with Identification Trees
Classification Nearest Neighbor
Nearest-Neighbor Classifiers
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
COSC 4335: Other Classification Techniques
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier
Advanced Artificial Intelligence Classification
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Statistical Learning Dong Liu Dept. EEIS, USTC.
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Nearest Neighbors CSC 576: Data Mining.
Data Mining Classification: Alternative Techniques
CSE4334/5334 Data Mining Lecture 7: Classification (4)
CS639: Data Management for Data Science
Data Mining CSCI 307, Spring 2019 Lecture 11
STT : Intro. to Statistical Learning
Presentation transcript:

k-Nearest neighbors and decision tree Nonparametric Supervised Learning

Outline Context of these algorithms K-nearest neighbors (k-NN) 1-nearest neighbor (1-NN Extension to k-nearest neighbors Decision Tree Summary

Outline Context of these algorithms K-nearest neighbors (k-NN) 1-nearest neighbor (1-NN Extension to k-nearest neighbors Decision Tree Summary

Context of these Algorithms Supervised Learning: labeled training samples Nonparametric: mathematical representation of the underlying probability distribution is hard to obtain Figure: various approaches in statistical pattern recognition

Figure: various approaches in statistical pattern recognition K-Nearest Neighbors Implicitly constructs decision boundaries Used for both classification and regression Figure: various approaches in statistical pattern recognition

Figure: various approaches in statistical pattern recognition Decision Tree Explicitly constructs decision boundaries Used for classification Figure: various approaches in statistical pattern recognition

Outline Context of these algorithms K-nearest neighbors (k-NN) 1-nearest neighbor (1-NN Extension to k-nearest neighbors Decision Tree Summary

K-Nearest Neighbors Goal: Classify an unknown training sample into one of C classes (can also be used for regression) Idea: To determine the label of an unknown sample (x), look at x’s k-nearest neighbors Image from MIT Opencourseware

Notation Training samples: x is the feature vector with d features y is a class label of {1,2,…C} Goal: determine for

Decision Boundaries Implicitly found Shown using a Voronoi Diagram

1-Nearest Neighbor (1-NN) Consider k = 1 1-NN algorithm Step 1: Find the closest to with respect to Euclidean distance Find that minimizes Step 2: Choose to be

Extensions of 1-Nearest Neighbor How many neighbors to consider? 1 for 1-NN vs. k for k-NN What distance to use? Euclidean, L1-norm, etc. How to combine neighbors’ labels? Majority vote vs. weighted majority vote

How many neighbors to consider? K Too Small K Too Large Noisy decision boundaries Over-smoothed boundaries k = 1 k = 7

What distance to use? Euclidean distance – treats every feature as equally important Distance needs to be meaningful (1 foot vs. 12 inches) Features could be insignificant Scaled Euclidean distance

Distance Metrics

How to combine neighbors’ labels? Majority vote: each of k-neighbors’ votes are weighted equally Weighted majority vote: closer neighbors’ votes get more weight Ex. Weight wi= 1/distance2 (if distance = 0, that sample gets 100% of vote) Note: for regression,

Pros and Cons of k-NN Simple Good results Easy to add new training examples Computationally expensive To determine nearest neighbor, visit each training samples O(nd) n = number of training samples d = dimensions

Outline Context of these algorithms K-nearest neighbors (k-NN) 1-nearest neighbor (1-NN Extension to k-nearest neighbors Decision Tree Summary

Decision Tree Goal: Classify an unknown training sample into one of C classes Idea: Set thresholds for a sequence of features to make a classification decision

Figure: an example of a simple decision tree Definitions Decision node: if-then decision based on features of testing sample Root node: the first decision node Leaf node: has a class label Figure: an example of a simple decision tree

Decision Boundaries Explicitly defined

Creating Optimal Decision Trees Classification and Regression Trees (CART) by Brieman et al. is one method to produce decision trees Creates binary decision trees – trees where each decision node has exactly two branches Recursively split the feature space into a set of non- overlapping regions

Creating Optimal Decision Trees Need to choose decisions that best partition the feature space Choose the split s at node t that maximizes Terms: = left child at node t = # records at tL / # records in training set = # records of class j at tL / # records at t

Creating Optimal Decision Trees C4.5 by Quinlan is another algorithm for creating trees Creates trees based on optimal splits Trees are not required to be binary

Creating Optimal Decision Trees Splits based on entropy Suppose variable X has k possible values pi = ni/n = estimated probability X has value i Entropy: Candidate split S partitions training set T into subsets T1, T2, …Tk Entropy is the weighted sum entropies at each subset

Creating Optimal Decision Trees Information gain: C4.5 selects the candidate split S that creates which maximizes the information gain

Pros and Cons of Decision Trees Simple to understand Little data preparation required Results are easy to follow Robust Handles large datasets well Practical algorithms based on heuristics which may not give globally-optimal trees Require pruning to avoid over-fitting data

Outline Context of these algorithms K-nearest neighbors (k-NN) 1-nearest neighbor (1-NN) Extension to k-nearest neighbors Decision Tree Summary

Summary Compare a new data point to similar labeled data points K-Nearest Neighbor Decision Tree Compare a new data point to similar labeled data points Implicitly define the decision boundaries Easy, but computationally expensive Use thresholds of feature values to determine classification Explicitly define decision boundaries Simple, but hard to find globally-optimal trees

Sources on k-NN “A study on classification techniques in data mining”. Kesavaraj, G.; Sukumaran, S.. Published in ICCCNT. MIT Opencourseware, 15.097 Spring 2012. Credit: Seyda Ertekin. http://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine- learning-and-statistics-spring-2012/lecture-notes/MIT15_097S12_lec06.pdf Oregon State – Machine Learning Course by Xiaoli Fern http://classes.engr.oregonstate.edu/eecs/spring2012/cs534/notes/knn.pdf Machine Learning Course by Rita Osadchy http://www.cs.haifa.ac.il/~rita/ml_course/lectures/KNN.pdf University of Wisconsin - Machine Learning Course by Xiaojin Zhu http://www.cs.sun.ac.za/~kroon/courses/machine_learning/lecture2/kNN- intro_to_ML.pdf UC Irvine – Intro to Artificial Intelligence Course by Richard Lathrop http://www.ics.uci.edu/~rickl/courses/cs-171/2014-wq-cs171/2014-wq-cs171-lecture- slides/2014wq171-19-LearnClassifiers.pdf

Sources on Decision Tree University of Wisconsin- Machine Learning Course by Jerry Zhu http://pages.cs.wisc.edu/~jerryzhu/cs540/handouts/dt.pdf Discovering Knowledge in Data: An Introduction to Data Mining. Larose, Daniel, T. (2005) “Statistical Pattern Recognition: A Review” Jain, Anil. K; Duin, Robert. P.W.; Mao, Jianchang (2000). “Statistical pattern recognition: a review”. IEEE Transtactions on Pattern Analysis and Machine Intelligence 22 (1): 4-37

Thank you Any questions?