Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.

Similar presentations


Presentation on theme: "Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal."— Presentation transcript:

1 Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal probability Rev. Thomas Bayes (1702-1761)

2 Bayesian decision theory Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidence E = e –Partially observable, stochastic, episodic environment –Examples: X = {spam, not spam}, e = email message X = {zebra, giraffe, hippo}, e = image features –The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise –What is agent’s optimal estimate of the value of X? Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)

3 MAP decision X = x: value of query variable E = e: evidence Maximum likelihood (ML) decision: likelihood prior posterior

4 Example: Spam Filter We have X = {spam, ¬spam}, E = email message. What should be our decision criterion? –Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability

5 Example: Spam Filter We have X = {spam, ¬spam}, E = email message. What should be our decision criterion? –Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability P(spam | message)  P(message | spam) P(spam) P(¬spam | message)  P(message | ¬spam) P(¬spam)

6 Example: Spam Filter We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) How do we represent the message? –Bag of words model: The order of the words is not important Each word is conditionally independent of the others given message class If the message consists of words (w 1, …, w n ), how do we compute P(w 1, …, w n | spam)? –Naïve Bayes assumption: each word is conditionally independent of the others given message class

7 Example: Spam Filter Our filter will classify the message as spam if In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow: Model parameters: –Priors P(spam), P(¬spam) –Likelihoods P(w i | spam), P(w i | ¬spam) These parameters need to be learned from a training set (a representative sample of email messages marked with their classes)

8 Parameter estimation Model parameters: –Priors P(spam), P(¬spam) –Likelihoods P(w i | spam), P(w i | ¬spam) Estimation by empirical word frequencies in the training set: –This happens to be the parameter estimate that maximizes the likelihood of the training data: P(w i | spam) = # of occurrences of w i in spam messages total # of words in spam messages d: index of training document, i: index of a word

9 Parameter estimation Model parameters: –Priors P(spam), P(¬spam) –Likelihoods P(w i | spam), P(w i | ¬spam) Estimation by empirical word frequencies in the training set: Parameter smoothing: dealing with words that were never seen or seen too few times –Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did P(w i | spam) = # of occurrences of w i in spam messages total # of words in spam messages

10 Bayesian decision making: Summary Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem: given some evidence E = e, what is P(X | e)? Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x 1,e 1 ), …, (x n,e n )}

11 Bag-of-word models for images Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

12 Bag-of-word models for images 1.Extract image features

13 Bag-of-word models for images 1.Extract image features

14 2.Learn “visual vocabulary” Bag-of-word models for images

15 1.Extract image features 2.Learn “visual vocabulary” 3.Map image features to visual words Bag-of-word models for images


Download ppt "Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal."

Similar presentations


Ads by Google