1 Lazy Learning – Nearest Neighbor Lantz Ch 3 Wk 2, Part 1.

Slides:



Advertisements
Similar presentations
DECISION TREES. Decision trees  One possible representation for hypotheses.
Advertisements

Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
CLUSTERING PROXIMITY MEASURES
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Classification and Decision Boundaries
Introduction to Data Mining with XLMiner
Data Mining Classification: Alternative Techniques
Navneet Goyal. Instance Based Learning  Rote Classifier  K- nearest neighbors (K-NN)  Case Based Resoning (CBR)
Nearest Neighbor. Predicting Bankruptcy Nearest Neighbor Remember all your data When someone asks a question –Find the nearest old data point –Return.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Chapter 7 – K-Nearest-Neighbor
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Nearest Neighbor Models (Reading: Chapter.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 12 Chicago School of Professional Psychology.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.
CES 514 – Data Mining Lec 9 April 14 Mid-term k nearest neighbor.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
INSTANCE-BASE LEARNING
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.
CS Instance Based Learning1 Instance Based Learning.
Distance and Similarity Measures
Advanced Multimedia Text Classification Tamara Berg.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Supervised Learning and k Nearest Neighbors Business Intelligence for Managers.
This week: overview on pattern recognition (related to machine learning)
GEOMETRIC VIEW OF DATA David Kauchak CS 451 – Fall 2013.
ADVANCED CLASSIFICATION TECHNIQUES David Kauchak CS 159 – Fall 2014.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Nearest Neighbor (NN) Rule & k-Nearest Neighbor (k-NN) Rule Non-parametric : Can be used with arbitrary distributions, No need to assume that the form.
11/12/2012ISC471 / HCI571 Isabelle Bichindaritz 1 Prediction.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
7.2 The Standard Normal Distribution. Standard Normal The standard normal curve is the one with mean μ = 0 and standard deviation σ = 1 We have related.
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
1 Data Mining: Concepts and Techniques (3 rd ed.) — Chapter 12 — Jiawei Han, Micheline Kamber, and Jian Pei University of Illinois at Urbana-Champaign.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
A Fast and Scalable Nearest Neighbor Based Classification Taufik Abidin and William Perrizo Department of Computer Science North Dakota State University.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 6: Nearest and k-nearest Neighbor Classification.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
KNN Classifier.  Handed an instance you wish to classify  Look around the nearby region to see what other classes are around  Whichever is most common—make.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
CS Machine Learning Instance Based Learning (Adapted from various sources)
Classification Algorithms Covering, Nearest-Neighbour.
1 Learning Bias & Clustering Louis Oliphant CS based on slides by Burr H. Settles.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Distance and Similarity Measures
Background on Classification
Instance Based Learning
Efficient Image Classification on Vertically Decomposed Data
Chapter 7 – K-Nearest-Neighbor
Instance Based Learning (Adapted from various sources)
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
A Fast and Scalable Nearest Neighbor Based Classification
Instance Based Learning
Classification Algorithms
COSC 4335: Other Classification Techniques
Lazy Learning – Classification Using Nearest Neighbors
Nearest Neighbors CSC 576: Data Mining.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

1 Lazy Learning – Nearest Neighbor Lantz Ch 3 Wk 2, Part 1

2 What’s Nearest Neighbor? A simple way to classify examples. Assign new examples to the class of the most similar labeled examples. I say this one’s also brown!

3 The kNN algorithm StrengthsWeaknesses Simple and effectiveDoes not produce a model, which limits the ability to find novel insights in relationships among features Makes no assumptions about the underlying data distribution Slow classification phase Fast training phaseRequires a large amount of memory Nominal features and missing data require additional processing kNN = “k-Nearest Neighbor”

4 kNN Training Begin with examples classified into several categories, labeled by a nominal variable. Need a training data set and a test dataset. On the test dataset, kNN will identify k records that are “nearest” in similarity. – You specify k in advance. – Each test example is assigned to the class of the majority of the k nearest neighbors.

5 Lantz’s example

6 Need to calculate “distance” Euclidian – Manhattan – Add absolute values.

7 Choosing k Our first “shot in the dark” guess! Experience and experimentation improve this. Larger k values reduce effects of noise. – And bias the “learner,” ignoring small patterns. – Typically k is set between 3 and 10. – Lantz uses the square root of the number of training examples! Larger training datasets help!

8 Need to “scale” the data Usually assume features should contribute equally. – Therefore should be on equivalent scales. – Using mean and standard deviation may be even better!

9 How to scale “nominal” data Bananas are green -> yellow -> brown -> black. – Could be 1 -> 2 -> 3 -> 4. But not every nominal variable is easy to convert to some kind of scale. – E.g., What color a car is, versus owner satisfaction. “Dummy coding” requires some thought.

10 Why kNN is “lazy” It doesn’t provide an abstraction as the “learning”. – “generalization” is on a case-by-case basis. – It’s not exactly “learning” anything. – Just uses the training data, verbatim. So, fast training, but slow predictions. – Both take O(n) time. It’s like “rote learning.” It’s “non-parametric”

11 A brief bit more math Officially, the nearest neighbor algorithm gives us a “Voronoi tesselation.” – The distance algorithm divides the plane into convex regions with “seeds” characterizing those.

12 High dimensions are a problem If you have lots of features The spaces tend to be extremely sparse – Every point is far away from every other point – Thus, the pairwise distances are uninformative. But, the effective dimensionality may be less – Some of the features may be irrelevant.

13 Lantz’s example – Breast cancer screening 1.Collect the data 2.Explore and prepare the data – Transformation – normalize numeric data – Data preparation – creating training and test datasets 3.Training a model on the data 4.Evaluating model performance 5.Improving model performance – Transformation – z-score standardization

14 Data preparation Common to divide into training and test data Lantz used his first 469 examples for training, and the last 100 for testing: > wbcd_train <- wbcd_n[1:469, ] > wbcd_test <- wbcd_n[470:569, ], and > wbcd_train_labels <- wbcd[1:469, 1] > wbcd_test_labels <- wbcd[470:569, 1] Where the data examples already were in random order.

15 Training and testing all at once The kNN algorithm in the standard “class” package does both: > wbcd_test_pred <- knn(train = wbcd_train, test = wbcd_test, cl = wbcd_train_labels, k = 21) The resulting vector, wbcd_test_pred, can then be compared with the real test results, wbcd_test_labels. Lantz uses the crossTable function from the package gmodels.

16 And the results are… | wbcd_test_pred wbcd_test_labels | Benign | Malignant | Row Total | | | | | Benign | 61 | 0 | 61 | | | | | | | | | | | | | | | | | Malignant | 2 | 37 | 39 | | | | | | | | | | | | | | | | | Column Total | 63 | 37 | 100 | | | | | | | | | TN FN FP TP