KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Slides:



Advertisements
Similar presentations
Principles of Density Estimation
Advertisements

Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
An Overview of Machine Learning
Lecture 3 Nonparametric density estimation and classification
Chapter 4: Linear Models for Classification
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
K-means clustering Hongning Wang
Classification and Decision Boundaries
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Text Categorization Hongning Wang Today’s lecture Bayes decision theory Supervised text categorization – General steps for text categorization.
K nearest neighbor and Rocchio algorithm
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Switch to Top-down Top-down or move-to-nearest Partition documents into ‘k’ clusters Two variants “Hard” (0/1) assignment of documents to clusters “soft”
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Scalable Text Mining with Sparse Generative Models
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
K Nearest Neighborhood (KNNs)
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
1 Data Mining Lecture 5: KNN and Bayes Classifiers.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Relevance Feedback Hongning Wang What we have learned so far Information Retrieval User results Query Rep Doc Rep (Index) Ranker.
1 E. Fatemizadeh Statistical Pattern Recognition.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Classification Problem GivenGiven Predict class label of a given queryPredict class label of a given query
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Nearest Neighbor Classifier 1.K-NN Classifier 2.Multi-Class Classification.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Information Retrieval and Organisation Chapter 14 Vector Space Classification Dell Zhang Birkbeck, University of London.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
Relevance Feedback Hongning Wang
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
KNN & Naïve Bayes Hongning Wang
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
Information Retrieval
K Nearest Neighbor Classification
Nearest-Neighbor Classifiers
Locality Sensitive Hashing
Instance Based Learning
Data Mining Classification: Alternative Techniques
Parametric Methods Berlin Chen, 2005 References:
Nonparametric density estimation and classification
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Recap: Naïve Bayes classifier
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

kNN & Naïve Bayes Hongning Wang

Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based classifiers – Naïve Bayes classifier A generative model – Parametric learning algorithm 6501: Text Mining2

How to classify this document? 6501: Text Mining3 ? Sports Politics Finance Documents by vector space representation

Let’s check the nearest neighbor 6501: Text Mining4 ? Sports Politics Finance Are you confident about this?

Let’s check more nearest neighbors Ask k nearest neighbors – Let them vote 6501: Text Mining5 ? Sports Politics Finance

Probabilistic interpretation of kNN 6501: Text Mining6 With Bayes rule: => Total number of instances Total number of instances in class 1 Nearest neighbors from class 1 Counting the nearest neighbors from class 1

kNN is close to optimal Asymptotically, the error rate of 1-nearest- neighbor classification is less than twice of the Bayes error rate Decision boundary – 1NN - Voronoi tessellation 6501: Text Mining7 A non-parametric estimation of posterior distribution

Components in kNN A distance metric – Euclidean distance/cosine similarity How many nearby neighbors to look at – k Instance look up – Efficiently search nearby points 6501: Text Mining8

Effect of k Choice of k influences the “smoothness” of the resulting classifier 6501: Text Mining9

Effect of k Choice of k influences the “smoothness” of the resulting classifier 6501: Text Mining10 k=1

Effect of k Choice of k influences the “smoothness” of the resulting classifier 6501: Text Mining11 k=5

Effect of k Large k -> smooth shape for decision boundary Small k -> complicated decision boundary 6501: Text Mining12 Error on training set Error on testing set Model complexity Error Smaller k Larger k

Efficient instance look-up 6501: Text Mining13 Training corpus sizeTesting corpus size Feature size

Efficient instance look-up Exact solutions – Build inverted index for documents Special mapping: word -> document list Speed-up is limited when average document length is large 6501: Text Mining14 information retrieval retrieved is helpful Doc1Doc2 Doc1 Doc2 Doc1Doc2 Doc1Doc2 Dictionary Postings

Efficient instance look-up Exact solutions – Build inverted index for documents Special mapping: word -> document list Speed-up is limited when average document length is large – Parallelize the computation Map-Reduce – Map training/testing data onto different reducers – Merge the nearest k neighbors from the reducers 6501: Text Mining15

Efficient instance look-up Approximate solutions – Locality sensitive hashing Similar documents -> (likely) same hash values 6501: Text Mining16 h(x)

Efficient instance look-up Approximate solution – Locality sensitive hashing Similar documents -> (likely) same hash values Construct the hash function such that similar items map to the same “buckets” with high probability – Learning-based: learn the hash function with annotated examples, e.g., must-link, cannot-link – Random projection 6501: Text Mining17

Random projection 6501: Text Mining18

Random projection 6501: Text Mining

Random projection 6501: Text Mining20

Efficient instance look-up Effectiveness of random projection – 1.2M images dimensions 6501: Text Mining x speed-up

Weight the nearby instances When the data distribution is highly skewed, frequent classes might dominate majority vote – They occur more often in the k nearest neighbors just because they have large volume 6501: Text Mining22 ? Sports Politics Finance

Weight the nearby instances 6501: Text Mining23

Summary of kNN Instance-based learning – No training phase – Assign label to a testing case by its nearest neighbors – Non-parametric – Approximate Bayes decision boundary in a local region Efficient computation – Locality sensitive hashing Random projection 6501: Text Mining24

Recall optimal Bayes decision boundary 6501: Text Mining25 False positive False negative *Optimal Bayes decision boundary

Estimating the optimal classifier 6501: Text Mining26 Class conditional density Class prior textinformationidentifyminingminedisusefultofromappledeliciousY D D D V binary features #parameters: Requirement:

We need to simplify this 6501: Text Mining27 This does not mean ‘white house’ is independent of ‘obama’!

Conditional v.s. marginal independence 6501: Text Mining28

Naïve Bayes classifier 6501: Text Mining29 Class conditional density Class prior #parameters: v.s. Computationally feasible

Naïve Bayes classifier 6501: Text Mining30 y x2x2 x3x3 xvxv x1x1 … By Bayes rule By independence assumption

Estimating parameters 6501: Text Mining31 textinformationidentifyminingminedisusefultofromappledeliciousY D D D

Enhancing Naïve Bayes for text classification I 6501: Text Mining32 Class bias Model parameterFeature vector

Enhancing Naïve Bayes for text classification 6501: Text Mining33 where a linear model with vector space representation? We will come back to this topic later.

Enhancing Naïve Bayes for text classification II 6501: Text Mining34

Enhancing Naïve Bayes for text classification III 6501: Text Mining35

Maximum a Posterior estimator 6501: Text Mining36 #pseudo instances Can be estimated from a related corpus or manually tuned

Summary of Naïve Bayes Optimal Bayes classifier – Naïve Bayes with independence assumptions Parameter estimation in Naïve Bayes – Maximum likelihood estimator – Smoothing is necessary 6501: Text Mining37

Today’s reading Introduction to Information Retrieval – Chapter 13: Text classification and Naive Bayes 13.2 – Naive Bayes text classification 13.4 – Properties of Naive Bayes – Chapter 14: Vector space classification 14.3 k nearest neighbor 14.4 Linear versus nonlinear classifiers 6501: Text Mining38