MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,

Slides:



Advertisements
Similar presentations
Basics of Statistical Estimation
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Naïve Bayes Classifier
Naïve Bayes Classifier
On Discriminative vs. Generative classifiers: Naïve Bayes
Quiz 9 Chapter 13 Note the two versions A & B Nov
Probabilistic inference
Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
Machine Learning CMPT 726 Simon Fraser University
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
Bayes Classifier, Linear Regression 10701/15781 Recitation January 29, 2008 Parts of the slides are from previous years’ recitation and lecture notes,
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Crash Course on Machine Learning
1 Naïve Bayes A probabilistic ML algorithm. 2 Axioms of Probability Theory All probabilities between 0 and 1 True proposition has probability 1, false.
Bayesian Networks. Male brain wiring Female brain wiring.
CSE 446 Gaussian Naïve Bayes & Logistic Regression Winter 2012
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Naïve Bayes Readings: Barber
SI485i : NLP Set 5 Using Naïve Bayes.
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
1 CS 391L: Machine Learning: Bayesian Learning: Naïve Bayes Raymond J. Mooney University of Texas at Austin.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
CSE 446 Logistic Regression Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer.
Empirical Research Methods in Computer Science Lecture 6 November 16, 2005 Noah Smith.
Classification Techniques: Bayesian Classification
CSE 446: Naïve Bayes Winter 2012 Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & Dan Klein.
MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30,
Chapter 6 Bayesian Learning
Slides for “Data Mining” by I. H. Witten and E. Frank.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Classification: Logistic Regression –NB & LR connections Readings: Barber.
Naïve Bayes Classifiers William W. Cohen. TREE PRUNING.
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten’s and E. Frank’s “Data Mining” and Jeremy Wyatt and others.
Regress-itation Feb. 5, Outline Linear regression – Regression: predicting a continuous value Logistic regression – Classification: predicting a.
COMP24111 Machine Learning Naïve Bayes Classifier Ke Chen.
Generative classifiers: The Gaussian classifier Ata Kaban School of Computer Science University of Birmingham.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Naïve Bayes Classification Recitation, 1/25/07 Jonathan Huang.
1 1)Bayes’ Theorem 2)MAP, ML Hypothesis 3)Bayes optimal & Naïve Bayes classifiers IES 511 Machine Learning Dr. Türker İnce (Lecture notes by Prof. T. M.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
Probability Theory and Parameter Estimation I
Discriminative and Generative Classifiers
Lecture 15: Text Classification & Naive Bayes
ECE 5424: Introduction to Machine Learning
Special Topics In Scientific Computing
Data Mining Lecture 11.
Machine Learning. k-Nearest Neighbor Classifiers.
Distributions and Concepts in Probability Theory
Classification Techniques: Bayesian Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Pattern Recognition and Machine Learning
Parametric Methods Berlin Chen, 2005 References:
Logistic Regression [Many of the slides were originally created by Prof. Dan Jurafsky from Stanford.]
Presentation transcript:

MLE’s, Bayesian Classifiers and Naïve Bayes Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 30, 2008 Required reading: Mitchell draft chapter, sections 1 and 2. (available on class website)

Naïve Bayes in a Nutshell Bayes rule: Assuming conditional independence among X i ’s: So, classification rule for X new = is:

Naïve Bayes Algorithm – discrete X i Train Naïve Bayes (examples) for each * value y k estimate for each * value x ij of each attribute X i estimate Classify ( X new ) * probabilities must sum to 1, so need estimate only n-1 parameters...

Estimating Parameters: Y, X i discrete-valued Maximum likelihood estimates (MLE’s): Number of items in set D for which Y=y k

Example: Live in Sq Hill? P(S|G,D,M) S=1 iff live in Squirrel Hill G=1 iff shop at Giant Eagle D=1 iff Drive to CMU M=1 iff Dave Matthews fan

Example: Live in Sq Hill? P(S|G,D,M) S=1 iff live in Squirrel Hill G=1 iff shop at Giant Eagle D=1 iff Drive to CMU M=1 iff Dave Matthews fan

Naïve Bayes: Subtlety #1 If unlucky, our MLE estimate for P(X i | Y) may be zero. (e.g., X 373 = Birthday_Is_January30) Why worry about just one parameter out of many? What can be done to avoid this?

Estimating Parameters: Y, X i discrete-valued Maximum likelihood estimates: MAP estimates (Dirichlet priors): Only difference: “imaginary” examples

Naïve Bayes: Subtlety #2 Often the X i are not really conditionally independent We use Naïve Bayes in many cases anyway, and it often works pretty well –often the right classification, even when not the right probability (see [Domingos&Pazzani, 1996]) What is effect on estimated P(Y|X)? –Special case: what if we add two copies: X i = X k

Learning to classify text documents Classify which s are spam Classify which s are meeting invites Classify which web pages are student home pages How shall we represent text documents for Naïve Bayes?

Baseline: Bag of Words Approach aardvark0 about2 all2 Africa1 apple0 anxious0... gas1... oil1 … Zaire0

For code and data, see click on “Software and Data”

What you should know: Training and using classifiers based on Bayes rule Conditional independence –What it is –Why it’s important Naïve Bayes –What it is –Why we use it so much –Training using MLE, MAP estimates –Discrete variables (Bernoulli) and continuous (Gaussian)

Questions: Can you use Naïve Bayes for a combination of discrete and real-valued X i ? How can we easily model just 2 of n attributes as dependent? What does the decision surface of a Naïve Bayes classifier look like?

What is form of decision surface for Naïve Bayes classifier?