1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.

Slides:



Advertisements
Similar presentations
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Advertisements

Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Bayes Rule The product rule gives us two ways to factor a joint probability: Therefore, Why is this useful? –Can get diagnostic probability P(Cavity |
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Intro to Bayesian Learning Exercise Solutions Ata Kaban The University of Birmingham 2005.
5/17/20151 Probabilistic Reasoning CIS 479/579 Bruce R. Maxim UM-Dearborn.
CS 484 – Artificial Intelligence1 Announcements Homework 8 due today, November 13 ½ to 1 page description of final project due Thursday, November 15 Current.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Statistical Learning: Bayesian and ML COMP155 Sections May 2, 2007.
Bayesian Belief Networks
Introduction to Bayesian Learning Bob Durrant School of Computer Science University of Birmingham (Slides: Dr Ata Kabán)
CSE (c) S. Tanimoto, 2008 Bayes Nets 1 Probabilistic Reasoning With Bayes’ Rule Outline: Motivation Generalizing Modus Ponens Bayes’ Rule Applying.
Naïve Bayesian Classifiers Before getting to Naïve Bayesian Classifiers let’s first go over some basic probability theory p(C k |A) is known as a conditional.
Introduction to Bayesian Learning Ata Kaban School of Computer Science University of Birmingham.
Bayes Classification.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
CSc411Artificial Intelligence1 Chapter 5 STOCHASTIC METHODS Contents The Elements of Counting Elements of Probability Theory Applications of the Stochastic.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Does Naïve Bayes always work?
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Rule Generation [Chapter ]
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Bayesian Networks. Male brain wiring Female brain wiring.
Text Classification, Active/Interactive learning.
Naive Bayes Classifier
Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 Wednesday, 20 October.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
CS464 Introduction to Machine Learning1 Bayesian Learning Features of Bayesian learning methods: Each observed training example can incrementally decrease.
Kansas State University Department of Computing and Information Sciences CIS 730: Introduction to Artificial Intelligence Lecture 25 of 41 Monday, 25 October.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Chapter 6 Bayesian Learning
Review: Probability Random variables, events Axioms of probability Atomic events Joint and marginal probability distributions Conditional probability distributions.
12/7/20151 Math b Conditional Probability, Independency, Bayes Theorem.
Uncertainty ECE457 Applied Artificial Intelligence Spring 2007 Lecture #8.
Education as a Signaling Device and Investment in Human Capital Topic 3 Part I.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Textbook Basics of an Expert System: – “Expert systems: Design and Development,” by: John Durkin, 1994, Chapters 1-4. Uncertainty (Probability, Certainty.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Bayesian Learning Bayes Theorem MAP, ML hypotheses MAP learners
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Classification COMP Seminar BCB 713 Module Spring 2011.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Chapter 4 Probability.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
CSE (c) S. Tanimoto, 2007 Bayes Nets 1 Bayes Networks Outline: Why Bayes Nets? Review of Bayes’ Rule Combining independent items of evidence General.
Naive Bayes Classifier. REVIEW: Bayesian Methods Our focus this lecture: – Learning and classification methods based on probability theory. Bayes theorem.
Bayesian Classification 1. 2 Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership.
Lecture 1.31 Criteria for optimal reception of radio signals.
ECE457 Applied Artificial Intelligence Fall 2007 Lecture #8
Review of Probability.
Chapter 4 Probability.
Naive Bayes Classifier
Data Science Algorithms: The Basic Methods
Conditional probability
Incorporating New Information to Decision Trees (posterior probabilities) MGS Chapter 6 Part 3.
Data Mining Lecture 11.
Classification Techniques: Bayesian Classification
Statistical NLP: Lecture 4
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
Class #16 – Tuesday, October 26
LECTURE 07: BAYESIAN ESTIMATION
Machine Learning: UNIT-3 CHAPTER-1
Naive Bayes Classifier
ECE457 Applied Artificial Intelligence Spring 2008 Lecture #8
basic probability and bayes' rule
Presentation transcript:

1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks

2 Chapter 12 Contents l Probabilistic Reasoning l Joint Probability Distributions l Bayes’ Theorem l Simple Bayesian Concept Learning l Bayesian Belief Networks l The Noisy-V Function l Bayes’ Optimal Classifier l The Naïve Bayes Classifier l Collaborative Filtering

3 Probabilistic Reasoning l Probabilities are expressed in a notation similar to that of predicates in FOPC: nP(S) = 0.5 nP(T) = 1 nP(¬(A Λ B) V C) = 0.2 l 1 = certain; 0 = certainly not

4 Conditional Probability l Conditional probability refers to the probability of one thing given that we already know another to be true: l This states the probability of B, given A.

5 Conditional Probability l Note that P(A|B) ≠ P(B|A) l P(R/\S) = 0.01 l P(S) = 0.1 l P(R) = 0.7

6 Conditional Probability l Conditional probability refers to the probability of one thing given that we already know another to be true: P(A \/ B) = P(A) + P(B) – P(A /\ B) P(A /\ B) = P(A) * p(B) if A and B are independent events.

7 Joint Probability Distributions l A joint probability distribution represents the combined probabilities of two or more variables. l This table shows, for example, that P (A Λ B) = 0.11 P (¬A Λ B) = 0.09 l Using this, we can calculate P(A): P(A) = P(A Λ B) + P(A Λ ¬B) = = 0.74

8 Bayes’ Theorem l Bayes’ theorem lets us calculate a conditional probability: l P(B) is the prior probability of B. l P(B | A) is the posterior probability of B.

9 Baye’s Thm l P(A/\B) = P(A|B) P(B) dependent events l P(A/\B) = P(B /\ A) = P(B|A) P(A) l P(A|B) P(B) = P(B|A) P(A) P(A|B) P(B) l P(B|A) = P(A)

10 Simple Bayesian Concept Learning (1) l P (H|E) is used to represent the probability that some hypothesis, H, is true, given evidence E. l Let us suppose we have a set of hypotheses H 1 …H n. l For each H i l Hence, given a piece of evidence, a learner can determine which is the most likely explanation by finding the hypothesis that has the highest posterior probability.

11 Simple Bayesian Concept Learning (2) l In fact, this can be simplified. Since P(E) is independent of H i it will have the same value for each hypothesis. l Hence, it can be ignored, and we can find the hypothesis with the highest value of: l We can simplify this further if all the hypotheses are equally likely, in which case we simply seek the hypothesis with the highest value of P(E|H i ). This is the likelihood of E given H i.

12 Example l If high temp (A), have cold (B) – 80% l P(A|B) = 0.8 l Suppose 1 in 10,000 have cold l Suppose 1 in 1,000 have high temp l P(A) = P(B) = l P(B|A) = {P(A|B)*P(B)}/P(A) l = chances in 1000 that you have a cold when having a high temp.

13 Bayesian Belief Networks (1) l A belief network shows the dependencies between a group of variables. l If two variables A and B are independent if the likelihood that A will occur has nothing to do with whether B occurs. l C and D are dependent on A; D and E are dependent on B. The Bayesian belief network has probabilities associated with each link. E.g., P(C|A) = 0.2, P(C|¬A) = 0.4

14 Bayesian Belief Networks (2) l A complete set of probabilities for this belief network might be: n P(A) = 0.1 n P(B) = 0.7 n P(C|A) = 0.2 n P(C|¬A) = 0.4 n P(D|A Λ B) = 0.5 n P(D|A Λ ¬B) = 0.4 n P(D|¬A Λ B) = 0.2 n P(D|¬A Λ ¬B) = n P(E|B) = 0.2 n P(E|¬B) = 0.1

15 Bayesian Belief Networks (3) l We can now calculate conditional probabilities: l P(A,B,C,D,E) = P(E|A,B,C,D)*P(A,B,C,D) l In fact, we can simplify this, since there are no dependencies between certain pairs of variables – between E and A, for example. Hence:

16 Example P(C) =.2 (go to college) P(S) =.8 if c,.2 if not c (study) P(P) =.6 if c,.5 if not c (party) P(F) =.9 if p,.7 if not p (fun) C P E F S

17 Example 2 S P P(E) exam success true true.6 true false.9 false true.1 false false.2 C P E F S

18 Example 3 P(C,S,¬P,E,¬F)=P(C)*P(S|C)*P(¬P|C)*P(E|S/\¬P)*P(¬F|¬P) = 0.2*0.8*0.4*0.9*0.3 = C P E F S

19 Bayes’ Optimal Classifier l A system that uses Bayes’ theory to classify data. l We have a piece of data y, and are seeking the correct hypothesis from H 1 … H 5, each of which assigns a classification to y. l The probability that y should be classified as c j is: l x 1 to x n are the training data, and m is the number of hypotheses. l This method provides the best possible classification for a piece of data.

20 The Naïve Bayes Classifier (1) l A vector of data is classified as a single classification. p(c i | d 1, …, d n ) l The classification with the highest posterior probability is chosen. l The hypothesis which has the highest posterior probability is the maximum a posteriori, or MAP hypothesis. l In this case, we are looking for the MAP classification. l Bayes’ theorem is used to find the posterior probability:

21 The Naïve Bayes Classifier (2) l since P(d 1, …, d n ) is a constant, independent of c i, we can eliminate it, and simply aim to find the classification c i, for which the following is maximised: l We now assume that all the attributes d 1, …, d n are independent l So P(d 1, …, d n |c i ) can be rewritten as: l The classification for which this is highest is chosen to classify the data.

22 Collaborative Filtering l A method that uses Bayesian reasoning to suggest items that a person might be interested in, based on their known interests. l if we know that Anne and Bob both like A, B and C, and that Anne likes D then we guess that Bob would also like D. l Can be calculated using decision trees: