Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic inference

Similar presentations


Presentation on theme: "Probabilistic inference"— Presentation transcript:

1 Probabilistic inference
Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidence E = e Partially observable, stochastic, episodic environment Examples: X = {spam, not spam}, e = message X = {zebra, giraffe, hippo}, e = image features Bayes decision theory: The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise The estimate of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e) This is the Maximum a Posteriori (MAP) decision Expected loss of a decision: P(decision is correct) * 0 + P(decision is wrong) * 1

2 MAP decision Value of x that has the highest posterior probability given the evidence e

3 MAP decision Value of x that has the highest posterior probability given the evidence e posterior likelihood prior

4 MAP decision Value of x that has the highest posterior probability given the evidence e Maximum likelihood (ML) decision: posterior likelihood prior

5 Example: Naïve Bayes model
Suppose we have many different types of observations (symptoms, features) F1, …, Fn that we want to use to obtain evidence about an underlying hypothesis H MAP decision involves estimating If each feature can take on k values, how many entries are in the joint probability table?

6 Example: Naïve Bayes model
Suppose we have many different types of observations (symptoms, features) F1, …, Fn that we want to use to obtain evidence about an underlying hypothesis H MAP decision involves estimating We can make the simplifying assumption that the different features are conditionally independent given the hypothesis: If each feature can take on k values, what is the complexity of storing the resulting distributions?

7 Naïve Bayes Spam Filter
MAP decision: to minimize the probability of error, we should classify a message as spam if P(spam | message) > P(¬spam | message)

8 Naïve Bayes Spam Filter
MAP decision: to minimize the probability of error, we should classify a message as spam if P(spam | message) > P(¬spam | message) We have P(spam | message)  P(message | spam)P(spam) and ¬P(spam | message)  P(message | ¬spam)P(¬spam)

9 Naïve Bayes Spam Filter
We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) The message is a sequence of words (w1, …, wn) Bag of words representation The order of the words in the message is not important Each word is conditionally independent of the others given message class (spam or not spam)

10 Naïve Bayes Spam Filter
We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) The message is a sequence of words (w1, …, wn) Bag of words representation The order of the words in the message is not important Each word is conditionally independent of the others given message class (spam or not spam) Our filter will classify the message as spam if

11 Bag of words illustration
US Presidential Speeches Tag Cloud

12 Bag of words illustration
US Presidential Speeches Tag Cloud

13 Bag of words illustration
US Presidential Speeches Tag Cloud

14 Naïve Bayes Spam Filter
posterior prior likelihood

15 Parameter estimation In order to classify a message, we need to know the prior P(spam) and the likelihoods P(word | spam) and P(word | ¬spam) These are the parameters of the probabilistic model How do we obtain the values of these parameters? prior P(word | spam) P(word | ¬spam) spam: 0.33 ¬spam: 0.67

16 Parameter estimation How do we obtain the prior P(spam) and the likelihoods P(word | spam) and P(word | ¬spam)? Empirically: use training data This is the maximum likelihood (ML) estimate, or estimate that maximizes the likelihood of the training data: # of word occurrences in spam messages P(word | spam) = total # of words in spam messages d: index of training document, i: index of a word

17 Parameter estimation How do we obtain the prior P(spam) and the likelihoods P(word | spam) and P(word | ¬spam)? Empirically: use training data Parameter smoothing: dealing with words that were never seen or seen too few times Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did # of word occurrences in spam messages P(word | spam) = total # of words in spam messages

18 Summary of model and parameters
Naïve Bayes model: Model parameters: Likelihood of spam Likelihood of ¬spam prior P(w1 | spam) P(w2 | spam) P(wn | spam) P(w1 | ¬spam) P(w2 | ¬spam) P(wn | ¬spam) P(spam) P(¬spam)

19 Bag-of-word models for images
Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

20 Bag-of-word models for images
Extract image features

21 Bag-of-word models for images
Extract image features

22 Bag-of-word models for images
Extract image features Learn “visual vocabulary”

23 Bag-of-word models for images
Extract image features Learn “visual vocabulary” Map image features to visual words

24 Bayesian decision making: Summary
Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem: given some evidence E = e, what is P(X | e)? Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x1,e1), …, (xn,en)}


Download ppt "Probabilistic inference"

Similar presentations


Ads by Google