KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

kNN & Naïve Bayes Hongning Wang CS@UVa

Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based classifiers – Naïve Bayes classifier A generative model – Parametric learning algorithm CS@UVaCS 6501: Text Mining2

How to classify this document? CS@UVaCS 6501: Text Mining3 ? Sports Politics Finance Documents by vector space representation

Let’s check the nearest neighbor CS@UVaCS 6501: Text Mining4 ? Sports Politics Finance Are you confident about this?

Let’s check more nearest neighbors Ask k nearest neighbors – Let them vote CS@UVaCS 6501: Text Mining5 ? Sports Politics Finance

Probabilistic interpretation of kNN CS@UVaCS 6501: Text Mining6 With Bayes rule: => Total number of instances Total number of instances in class 1 Nearest neighbors from class 1 Counting the nearest neighbors from class 1

kNN is close to optimal Asymptotically, the error rate of 1-nearest- neighbor classification is less than twice of the Bayes error rate Decision boundary – 1NN - Voronoi tessellation CS@UVaCS 6501: Text Mining7 A non-parametric estimation of posterior distribution

Components in kNN A distance metric – Euclidean distance/cosine similarity How many nearby neighbors to look at – k Instance look up – Efficiently search nearby points CS@UVaCS 6501: Text Mining8

Effect of k Choice of k influences the “smoothness” of the resulting classifier CS@UVaCS 6501: Text Mining9

Effect of k Choice of k influences the “smoothness” of the resulting classifier CS@UVaCS 6501: Text Mining10 k=1

Effect of k Choice of k influences the “smoothness” of the resulting classifier CS@UVaCS 6501: Text Mining11 k=5

Effect of k Large k -> smooth shape for decision boundary Small k -> complicated decision boundary CS@UVaCS 6501: Text Mining12 Error on training set Error on testing set Model complexity Error Smaller k Larger k

Efficient instance look-up CS@UVaCS 6501: Text Mining13 Training corpus sizeTesting corpus size Feature size

Efficient instance look-up Exact solutions – Build inverted index for documents Special mapping: word -> document list Speed-up is limited when average document length is large CS@UVaCS 6501: Text Mining14 information retrieval retrieved is helpful Doc1Doc2 Doc1 Doc2 Doc1Doc2 Doc1Doc2 Dictionary Postings

Efficient instance look-up Exact solutions – Build inverted index for documents Special mapping: word -> document list Speed-up is limited when average document length is large – Parallelize the computation Map-Reduce – Map training/testing data onto different reducers – Merge the nearest k neighbors from the reducers CS@UVaCS 6501: Text Mining15

Efficient instance look-up Approximate solutions – Locality sensitive hashing Similar documents -> (likely) same hash values CS@UVaCS 6501: Text Mining16 h(x)

Efficient instance look-up Approximate solution – Locality sensitive hashing Similar documents -> (likely) same hash values Construct the hash function such that similar items map to the same “buckets” with high probability – Learning-based: learn the hash function with annotated examples, e.g., must-link, cannot-link – Random projection CS@UVaCS 6501: Text Mining17

110 101 Random projection CS@UVaCS 6501: Text Mining18

Random projection CS@UVaCS 6501: Text Mining19 101 101

Random projection CS@UVaCS 6501: Text Mining20

Efficient instance look-up Effectiveness of random projection – 1.2M images + 1000 dimensions CS@UVaCS 6501: Text Mining21 1000x speed-up

Weight the nearby instances When the data distribution is highly skewed, frequent classes might dominate majority vote – They occur more often in the k nearest neighbors just because they have large volume CS@UVaCS 6501: Text Mining22 ? Sports Politics Finance

Weight the nearby instances CS@UVaCS 6501: Text Mining23

Summary of kNN Instance-based learning – No training phase – Assign label to a testing case by its nearest neighbors – Non-parametric – Approximate Bayes decision boundary in a local region Efficient computation – Locality sensitive hashing Random projection CS@UVaCS 6501: Text Mining24

Recall optimal Bayes decision boundary CS@UVaCS 6501: Text Mining25 False positive False negative *Optimal Bayes decision boundary

Estimating the optimal classifier CS@UVaCS 6501: Text Mining26 Class conditional density Class prior textinformationidentifyminingminedisusefultofromappledeliciousY D1111101110001 D2110011101001 D3000001000110 V binary features #parameters: Requirement:

We need to simplify this CS@UVaCS 6501: Text Mining27 This does not mean ‘white house’ is independent of ‘obama’!

Conditional v.s. marginal independence CS@UVaCS 6501: Text Mining28

Naïve Bayes classifier CS@UVaCS 6501: Text Mining29 Class conditional density Class prior #parameters: v.s. Computationally feasible

Naïve Bayes classifier CS@UVaCS 6501: Text Mining30 y x2x2 x3x3 xvxv x1x1 … By Bayes rule By independence assumption

Estimating parameters CS@UVaCS 6501: Text Mining31 textinformationidentifyminingminedisusefultofromappledeliciousY D1111101110001 D2110011101001 D3000001000110

Enhancing Naïve Bayes for text classification I CS@UVaCS 6501: Text Mining32 Class bias Model parameterFeature vector

Enhancing Naïve Bayes for text classification CS@UVaCS 6501: Text Mining33 where a linear model with vector space representation? We will come back to this topic later.

Enhancing Naïve Bayes for text classification II CS@UVaCS 6501: Text Mining34

Enhancing Naïve Bayes for text classification III CS@UVaCS 6501: Text Mining35

Maximum a Posterior estimator CS@UVaCS 6501: Text Mining36 #pseudo instances Can be estimated from a related corpus or manually tuned

Summary of Naïve Bayes Optimal Bayes classifier – Naïve Bayes with independence assumptions Parameter estimation in Naïve Bayes – Maximum likelihood estimator – Smoothing is necessary CS@UVaCS 6501: Text Mining37

Today’s reading Introduction to Information Retrieval – Chapter 13: Text classification and Naive Bayes 13.2 – Naive Bayes text classification 13.4 – Properties of Naive Bayes – Chapter 14: Vector space classification 14.3 k nearest neighbor 14.4 Linear versus nonlinear classifiers CS@UVaCS 6501: Text Mining38

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Similar presentations

Presentation on theme: "KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Similar presentations

Presentation on theme: "KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based."— Presentation transcript:

Similar presentations

About project

Feedback