Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal.

Slides:



Advertisements
Similar presentations
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.
Advertisements

Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
1 Essential Probability & Statistics (Lecture for CS598CXZ Advanced Topics in Information Retrieval ) ChengXiang Zhai Department of Computer Science University.
Generative learning methods for bags of features
Visual Recognition Tutorial
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Beyond bags of features: Part-based models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.
Bayesian network inference
… Hidden Markov Models Markov assumption: Transition model:
Review: Bayesian learning and inference
Probabilistic inference
Probability.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Classification and risk prediction
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Sample Midterm question. Sue want to build a model to predict movie ratings. She has a matrix of data, where for M movies and U users she has collected.
Generative learning methods for bags of features
Visual Recognition Tutorial
Naïve Bayes Classification Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 14, 2014.
Computer vision: models, learning and inference
Thanks to Nir Friedman, HU
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Review: Probability Random variables, events Axioms of probability
Exercise Session 10 – Image Categorization
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Empirical Research Methods in Computer Science Lecture 6 November 16, 2005 Noah Smith.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
Machine Learning  Up until now: how to reason in a model and how to make optimal decisions  Machine learning: how to acquire a model on the basis of.
CHAPTER 6 Naive Bayes Models for Classification. QUESTION????
Bayesian Networks Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Machine Learning 5. Parametric Methods.
Bayes Nets & HMMs Tamara Berg CS 560 Artificial Intelligence Many slides throughout the course adapted from Svetlana Lazebnik, Dan Klein, Stuart Russell,
Probabilistic Robotics Introduction Probabilities Bayes rule Bayes filters.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
Bayesian Learning Evgueni Smirnov Overview Bayesian Theorem Maximum A Posteriori Hypothesis Naïve Bayes Classifier Learning Text Classifiers.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Text Classification and Naïve Bayes Formalizing the Naïve Bayes Classifier.
Bayesian Learning Reading: Tom Mitchell, “Generative and discriminative classifiers: Naive Bayes and logistic regression”, Sections 1-2. (Linked from.
CS479/679 Pattern Recognition Dr. George Bebis
Bayesian inference, Naïve Bayes model
Usman Roshan CS 675 Machine Learning
ICS 280 Learning in Graphical Models
Ch3: Model Building through Regression
Lecture 15: Text Classification & Naive Bayes
Machine Learning. k-Nearest Neighbor Classifiers.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
LECTURE 23: INFORMATION THEORY REVIEW
LECTURE 07: BAYESIAN ESTIMATION
Speech recognition, machine learning
Will Penny Wellcome Trust Centre for Neuroimaging,
Speech recognition, machine learning
Presentation transcript:

Bayes Rule How is this rule derived? Using Bayes rule for probabilistic inference: –P(Cause | Evidence): diagnostic probability –P(Evidence | Cause): causal probability Rev. Thomas Bayes ( )

Bayesian decision theory Suppose the agent has to make a decision about the value of an unobserved query variable X given some observed evidence E = e –Partially observable, stochastic, episodic environment –Examples: X = {spam, not spam}, e = message X = {zebra, giraffe, hippo}, e = image features –The agent has a loss function, which is 0 if the value of X is guessed correctly and 1 otherwise –What is agent’s optimal estimate of the value of X? Maximum a posteriori (MAP) decision: value of X that minimizes expected loss is the one that has the greatest posterior probability P(X = x | e)

MAP decision X = x: value of query variable E = e: evidence Maximum likelihood (ML) decision: likelihood prior posterior

Example: Spam Filter We have X = {spam, ¬spam}, E = message. What should be our decision criterion? –Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability

Example: Spam Filter We have X = {spam, ¬spam}, E = message. What should be our decision criterion? –Compute P(spam | message) and P(¬spam | message), and assign the message to the class that gives higher posterior probability P(spam | message)  P(message | spam) P(spam) P(¬spam | message)  P(message | ¬spam) P(¬spam)

Example: Spam Filter We need to find P(message | spam) P(spam) and P(message | ¬spam) P(¬spam) How do we represent the message? –Bag of words model: The order of the words is not important Each word is conditionally independent of the others given message class If the message consists of words (w 1, …, w n ), how do we compute P(w 1, …, w n | spam)? –Naïve Bayes assumption: each word is conditionally independent of the others given message class

Example: Spam Filter Our filter will classify the message as spam if In practice, likelihoods are pretty small numbers, so we need to take logs to avoid underflow: Model parameters: –Priors P(spam), P(¬spam) –Likelihoods P(w i | spam), P(w i | ¬spam) These parameters need to be learned from a training set (a representative sample of messages marked with their classes)

Parameter estimation Model parameters: –Priors P(spam), P(¬spam) –Likelihoods P(w i | spam), P(w i | ¬spam) Estimation by empirical word frequencies in the training set: –This happens to be the parameter estimate that maximizes the likelihood of the training data: P(w i | spam) = # of occurrences of w i in spam messages total # of words in spam messages d: index of training document, i: index of a word

Parameter estimation Model parameters: –Priors P(spam), P(¬spam) –Likelihoods P(w i | spam), P(w i | ¬spam) Estimation by empirical word frequencies in the training set: Parameter smoothing: dealing with words that were never seen or seen too few times –Laplacian smoothing: pretend you have seen every vocabulary word one more time than you actually did P(w i | spam) = # of occurrences of w i in spam messages total # of words in spam messages

Bayesian decision making: Summary Suppose the agent has to make decisions about the value of an unobserved query variable X based on the values of an observed evidence variable E Inference problem: given some evidence E = e, what is P(X | e)? Learning problem: estimate the parameters of the probabilistic model P(X | E) given a training sample {(x 1,e 1 ), …, (x n,e n )}

Bag-of-word models for images Csurka et al. (2004), Willamowski et al. (2005), Grauman & Darrell (2005), Sivic et al. (2003, 2005)

Bag-of-word models for images 1.Extract image features

Bag-of-word models for images 1.Extract image features

2.Learn “visual vocabulary” Bag-of-word models for images

1.Extract image features 2.Learn “visual vocabulary” 3.Map image features to visual words Bag-of-word models for images