Naive Bayes Classifier

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Naïve-Bayes Classifiers Business Intelligence for Managers.
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
Bayesian Learning Provides practical learning algorithms
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Lecture 5 Bayesian Learning
Naïve Bayes Classifier
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Overview Full Bayesian Learning MAP learning
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
1er. Escuela Red ProTIC - Tandil, de Abril, Bayesian Learning 5.1 Introduction –Bayesian learning algorithms calculate explicit probabilities.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Bayes Classification.
Text Categorization Moshe Koppel Lecture 2: Naïve Bayes Slides based on Manning, Raghavan and Schutze.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Does Naïve Bayes always work?
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
NAÏVE BAYES CLASSIFIER 1 ACM Student Chapter, Heritage Institute of Technology 10 th February, 2012 SIGKDD Presentation by Anirban Ghose Parami Roy Sourav.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Computing & Information Sciences Kansas State University Wednesday, 22 Oct 2008CIS 530 / 730: Artificial Intelligence Lecture 22 of 42 Wednesday, 22 October.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Chapter 6 Bayesian Learning
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
Bayesian Learning Provides practical learning algorithms
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
1 Machine Learning: Lecture 6 Bayesian Learning (Based on Chapter 6 of Mitchell T.., Machine Learning, 1997)
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
CS Ensembles and Bayes1 Ensembles, Model Combination and Bayesian Combination.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Learning. Uncertainty & Probability Baye's rule Choosing Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's concept learning Maximum.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Classification: Naïve Bayes Classifier
Lecture 1.31 Criteria for optimal reception of radio signals.
Does Naïve Bayes always work?
Data Science Algorithms: The Basic Methods
Bayesian Rule & Gaussian Mixture Models
Computer Science Department
Bayes Net Learning: Bayesian Approaches
Lecture 15: Text Classification & Naive Bayes
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
NAÏVE BAYES CLASSIFICATION
Naïve Bayes Classifier
Presentation transcript:

Naive Bayes Classifier

REVIEW: Bayesian Methods Our focus this lecture: Learning and classification methods based on probability theory. Bayes theorem plays a critical role in probabilistic learning and classification. Bayes theorem uses prior probability of each category given no information about an item. Categorization produces a posterior probability distribution over the possible categories given a description of an item. Example – tennis playing last lecture

In these reasonings we use the Basic Probability Formulas: Product rule Sum rule Bayes theorem Theorem of total probability, if event Ai is mutually exclusive and probability sum to 1

REVIEW AND TERMINOLOGY: Bayes Theorem Given a hypothesis h and data D which bears on the hypothesis: P(h): independent probability of h: prior probability P(D): independent probability of D P(D|h): conditional probability of D given h: likelihood P(h|D): conditional probability of h given D: posterior probability

Possible problem to solve using this method: Does patient have cancer or not? A patient takes a lab test and the result comes back positive. It is known that the test returns a correct positive result in only 99% of the cases and a correct negative result in only 95% of the cases. Furthermore, only 0.03 of the entire population has this disease. 1. What is the probability that this patient has cancer? 2. What is the probability that he does not have cancer? 3. What is the diagnosis? The same applies to the “door open, door closed” example of a robot on corridor few lectures earlier. The same is true in any conditional probability reasoning.

We gave an example of MAP in last set of slides in tennis example Maximum A Posterior Based on Bayes Theorem, we can compute the Maximum A Posterior (MAP) hypothesis for the data We are interested in the best hypothesis for some space H given observed training data D. H: set of all hypothesis. Note that we can drop P(D) as the probability of the data is constant (and independent of the hypothesis). We gave an example of MAP in last set of slides in tennis example

Maximum Likelihood Hypothesis Now assume that all hypotheses are equally probable a priori, i.e., P(hi ) = P(hj ) for all hi, hj belong to H. This is called assuming a uniform prior. It simplifies computing the posterior: This hypothesis is called the maximum likelihood hypothesis.

Desirable Properties of Bayes Classifier Incrementality: with each training example, the prior and the likelihood can be updated dynamically: flexible and robust to errors. Combines prior knowledge and observed data: prior probability of a hypothesis multiplied with probability of the hypothesis given the training data Probabilistic hypothesis: outputs not only a classification, but a probability distribution over all classes Again refer to tennis example

Characteristics of Bayes Classifiers Assumption: training set consists of instances of different classes described cj as conjunctions of attributes values Task: Classify a new instance d based on a tuple of attribute values into one of the classes cj  C Key idea: assign the most probable class using Bayes Theorem.

Parameters estimation in Naïve Bayes P(cj) Can be estimated from the frequency of classes in the training examples. P(x1,x2,…,xn|cj) O(|X|n•|C|) is the number of parameters This probability could only be estimated if a very, very large number of training examples was available. Independence Assumption: attribute values are conditionally independent given the target value: naïve Bayes.

Properties of Naïve Bayes Estimating instead of greatly reduces the number of parameters (and the data sparseness). The learning step in Naïve Bayes consists of estimating and based on the frequencies in the training data An unseen instance is classified by computing the class that maximizes the posterior When conditioned independence is satisfied, Naïve Bayes corresponds to MAP classification.

Reminder: Example. ‘Play Tennis’ data Question: For the day <sunny, cool, high, strong>, what’s the play prediction?

Naive Bayes solution Classify any new datum instance x=(a1,…aT) as: To do this based on training examples, we need to estimate the parameters from the training examples: For each target value (hypothesis) h For each attribute value at of each datum instance

Based on the examples in the table, classify the following datum x: x=(Outl=Sunny, Temp=Cool, Hum=High, Wind=strong) That means: Play tennis or not? Working:

Underflow Prevention Multiplying lots of probabilities, which are between 0 and 1 by definition, can result in floating-point underflow. Since log(xy) = log(x) + log(y), it is better to perform all computations by summing logs of probabilities rather than multiplying probabilities. Class with highest final un-normalized log probability score is still the most probable.