Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHAPTER 6 Naive Bayes Models for Classification. QUESTION????

Similar presentations


Presentation on theme: "CHAPTER 6 Naive Bayes Models for Classification. QUESTION????"— Presentation transcript:

1 CHAPTER 6 Naive Bayes Models for Classification

2 QUESTION????

3 Bayes’ Rule in Bayes Nets

4 Combining Evidence

5 General Naïve Bayes

6 Modeling with Naïve Bayes

7 What to do with Naïve Bayes? 1. Create a Naïve Bayes model: We need local probability estimates Could elicit them from a human Better: Estimate them from observations! This is called parameter estimation, or more generally learning 2. Use a Naïve Bayes model to estimate probability of causes given observations of effects: This is a specific kind of probabilistic inference Requires just a simple computation (next slide) From this we can also get the most likely cause, which is called prediction, or classification These are the basic tasks of machine learning!

8 Building Naïve Bayes Models What do we need to specify a Bayesian Network? Directed acyclic graph (DAG) Conditional probability tables (CPDs) How do we build a Naïve Bayes model? We know the graph structure already (Why?) Estimates of local conditional probability tables (CPTs) P(C), the prior over causes P(E|C) for each evidence variable These typically come from observed data These probabilities are collectively called the parameters of the model and denoted by θ

9 Review: Parameter Estimation

10 A Spam Filter

11 Baselines First task: get a baseline Baselines are very simple “straw man” procedures Help determine how hard the task is Help know what a “good” accuracy is Weak baseline: most frequent label classifier Gives all test instances whatever label was most common in the training set E.g. for spam filtering, might label everything as ham Accuracy might be very high if the problem is skewed For real research, usually use previous work as a (strong) baseline

12 Naïve Bayes for Text

13 Example: Spam Filtering Raw probabilities don’t affect the posteriors; relative probabilities (odds ratios) do:

14 Generalization and Overfitting Relative frequency parameters will overfit the training data! Unlikely that every occurrence of “minute” is 100% spam Unlikely that every occurrence of “seriously” is 100% ham What about the words that don’t occur in the training set? In general, we can’t go around giving unseen events zero probability As an extreme case, imagine using the entire email as the only feature Would get the training data perfect (if deterministic labeling) Wouldn’t generalize at all Just making the bag-of-words assumption gives us some generalization, but isn’t enough To generalize better: we need to smooth or regularize the estimates

15 Estimation: Smoothing Problems with maximum likelihood estimates: –If I flip a coin once, and it’s heads, what’s the estimate for P(heads)? –What if I flip it 50 times with 27 heads? –What if I flip 10M times with 8M heads? Basic idea: –We have some prior expectation about parameters (here, the probability of heads) –Given little evidence, we should skew towards prior –Given a lot of evidence, we should listen to the data

16 Estimation: Smoothing

17 Estimation: Laplace Smoothing

18

19 Estimation: Linear Interpolation

20 Real NB: Smoothing

21 Spam Example


Download ppt "CHAPTER 6 Naive Bayes Models for Classification. QUESTION????"

Similar presentations


Ads by Google