Naive Bayes model Comp221 tutorial 4 (assignment 1) TA: Zhang Kai.

Slides:



Advertisements
Similar presentations
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Advertisements

Naive Bayes Classifiers, an Overview By Roozmehr Safi.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Naïve Bayes Classifier
Probabilistic Generative Models Rong Jin. Probabilistic Generative Model Classify instance x into one of K classes Class prior Density function for class.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Probabilistic inference
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Lecture 20 Object recognition I
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Today Logistic Regression Decision Trees Redux Graphical Models
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Review: Probability Random variables, events Axioms of probability
Crash Course on Machine Learning
SPAM DETECTION USING MACHINE LEARNING Lydia Song, Lauren Steimle, Xiaoxiao Xu.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Exercise Session 10 – Image Categorization
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Naïve Bayes Classifier Ke Chen Extended by Longin Jan Latecki COMP20411 Machine Learning.
Principles of Pattern Recognition
Bayesian Networks. Male brain wiring Female brain wiring.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Machine Learning II 부산대학교 전자전기컴퓨터공학과 인공지능연구실 김민호
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Intro. ANN & Fuzzy Systems Lecture 15. Pattern Classification (I): Statistical Formulation.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
KNN & Naïve Bayes Hongning Wang
Data Mining Chapter 4 Algorithms: The Basic Methods Reporter: Yuen-Kuei Hsueh.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Lecture 1.31 Criteria for optimal reception of radio signals.
Bayesian inference, Naïve Bayes model
Lecture 15. Pattern Classification (I): Statistical Formulation
Naive Bayes Classifier
Comp328 tutorial 3 Kai Zhang
Classification with Perceptrons Reading:
Tackling the Poor Assumptions of Naive Bayes Text Classifiers Pubished by: Jason D.M.Rennie, Lawrence Shih, Jamime Teevan, David R.Karger Liang Lan 11/19/2007.
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Naïve Bayes Classifier
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Prepared by: Mahmoud Rafeek Al-Farra
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Naïve Bayes Classifier
Presentation transcript:

Naive Bayes model Comp221 tutorial 4 (assignment 1) TA: Zhang Kai

Outline Bayes probability model Naive Bayes classifier Text classification Digit classification Assignment specifications

Naive Bayes classifier A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions, or more specifically, independent feature model.

Graphical illustration – a class node C at root, want P(C|F1,…,Fn) – evidence nodes F - observed features as leaves – conditional independence between all evidence C F1 F2Fn …… Naive Bayes probability modelprobability model

Naive Bayes probability modelprobability model The classifier is a conditional model Following the Bayes’s rule strictly, we have ….. Simplify this through conditional independence -conditional independence So the conditional distribution over the class C is Z is constant given features

Naive Bayes classifier The naive Bayes classifier combines naive Bayes probability model with a decision rule, such as the maximum a posteriori or MAP decision rule. If there are k classes and if a model for p(Fi) can be expressed by r parameters, then the naive Bayes model has (k − 1) + n r k parameters.

Text Classification Task- classify text documents into one of the pre-defined classes such as sports, recreation, politics, war, economy,…,etc, Given – K groups of training texts – Each group with a label, containing a number of text documents

Procedures Computing a priori class probabilities – Count the number of text documents in each directory/class ni – Total number of training text documents n – Prior probability P(Ci) = ni / n

Computing class conditional word likelihoods – Suppose we have chosen m key words, denoted as w1, w2,…,wm – Count the number of times – cji, that word wj occurs in text class Ci – Count the number of words – ni, in class Ci – Class conditional probaiblity is P(wj | Ci) = cji / ni

Classifying a new message d – Compute the features of d, i.e., the number of times word wj occuring in d – P(Ci|d) = P(Ci|w1,w2,…,wd) œ P(Ci)P(w1|Ci) (w2|Ci)… (wd|Ci) – Assign d to the class I that has the maximum posterior probability

Attentions Preprocessing – eliminating punctuation – eliminating numerals – converting all characters to lowercase – eliminating all words with less than 4 letters

You need to build a large vocabulary and separately counts how often a given word was encountered. The vocabulary can be built using a hash table. How to choose the key words wi’s? – For each class, you can pick out the first k words that occurs most frequently – For all the training data, pick out the first k works that appears most frequently – Union all these words as key-words/features

Zero probabilities must be avoided (why?) – This occurs when one word has been encountered only in one class, but not others. – In this case the class conditional probability is zero – To prevent this, re-estimate the conditional prob as P(wj|Ci) = ε/ni with ni a small, tunable number Convert all probabilities to logprobabilities (loglikeli- hoods) to avoid exceeding the dynamic range of the computer representation of real numbers

Digit Classification (assignment 1) USPS data set contains normalized handwritten digits, scanned by the U.S. Postal Service. 16 x 16 grayscale images 7291 training and 2007 test observations Format: each line consists of the digit id (0-9) followed by the 256 grayscale values. The test set is notoriously "difficult“ Download it from here

USPS digits

Setting Classes: 0~9 Features: each pixel is used as a feature, so there are 16 by 16, i.e., 256 features – Rather than pixel gray values, we can use more informative features, such as (detected) corners, crosses, slope, gravity center, etc. – How to quantize the real valued features. Task: classify new digits into one of the classes

Specifications (preliminary, assignment 1 will come soon on Friday) You can either use matlab or c++ for programming – If you use c++, you should have created the class and its members/functions as required – If you use matlab, you should have written functions as required – Input and output format will also be fixed in the assignemnt

Files Matlab file to read the USPS data Matlab file – >[n, digit, label] = read_usps(path, file); – path = ‘c:\...’; file = ‘usps_train.txt’; – n: number of digits/images obtained – Digit: a 16 by 16 by n matrix; – Label: label of each image; You may want to use it to read the USPS data

Matlab file to output a series of files – >output(str,i1,i2); – str: common string part ; – i1 and i2 is the starting and ending integers You may want to use it to write the digits into separate files with the naming system you like