Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Slides:



Advertisements
Similar presentations
Computer Science CPSC 322 Lecture 25 Top Down Proof Procedure (Ch 5.2.2)
Advertisements

Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
Reasoning under Uncertainty: Marginalization, Conditional Prob., and Bayes Computer Science cpsc322, Lecture 25 (Textbook Chpt ) Nov, 6, 2013.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Uncertainty Everyday reasoning and decision making is based on uncertain evidence and inferences. Classical logic only allows conclusions to be strictly.
Uncertain Reasoning CPSC 315 – Programming Studio Spring 2009 Project 2, Lecture 6.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.
Marginal Independence and Conditional Independence Computer Science cpsc322, Lecture 26 (Textbook Chpt 6.1-2) March, 19, 2010.
CPSC 422 Review Of Probability Theory.
Probability.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
CPSC 322, Lecture 21Slide 1 Bottom Up: Soundness and Completeness Computer Science cpsc322, Lecture 21 (Textbook Chpt 5.2) March, 5, 2010.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
CPSC 322, Lecture 29Slide 1 Reasoning Under Uncertainty: Bnet Inference (Variable elimination) Computer Science cpsc322, Lecture 29 (Textbook Chpt 6.4)
CPSC 322, Lecture 24Slide 1 Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1) March, 15,
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Methods in Computational Linguistics II Queens College Lecture 2: Counting Things.
Uncertainty Chapter 13. Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability.
CPSC 322, Lecture 24Slide 1 Finish Logics… Reasoning under Uncertainty: Intro to Probability Computer Science cpsc322, Lecture 24 (Textbook Chpt 6.1, 6.1.1)
Uncertainty Probability and Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Computer Science CPSC 322 Lecture 26 Uncertainty and Probability (Ch. 6.1, 6.1.1, 6.1.3)
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Reasoning Under Uncertainty: Independence and Inference Jim Little Uncertainty 5 Nov 10, 2014 Textbook §6.3.1, 6.5, 6.5.1,
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & the Chain Rule Jim Little Uncertainty 2 Nov 3, 2014 Textbook §6.1.3.
Reasoning Under Uncertainty: Introduction to Probability CPSC 322 – Uncertainty 1 Textbook §6.1 March 16, 2011.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) Nov, 5, 2012.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Reasoning Under Uncertainty: Independence Jim Little Uncertainty 3 Nov 5, 2014 Textbook §6.2.
Computer Science CPSC 322 Lecture 22 Logical Consequences, Proof Procedures (Ch 5.2.2)
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
Reasoning Under Uncertainty: Independence CPSC 322 – Uncertainty 3 Textbook §6.2 March 21, 2011.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
Reasoning Under Uncertainty: Independence and Inference CPSC 322 – Uncertainty 5 Textbook §6.3.1 (and for HMMs) March 25, 2011.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Uncertainty Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Outline [AIMA Ch 13] 1 Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty & Probability CIS 391 – Introduction to Artificial Intelligence AIMA, Chapter 13 Many slides adapted from CMSC 421 (U. Maryland) by Bonnie.
Anifuddin Azis UNCERTAINTY. 2 Introduction The world is not a well-defined place. There is uncertainty in the facts we know: What’s the temperature? Imprecise.
Matching ® ® ® Global Map Local Map … … … obstacle Where am I on the global map?                                   
Computer Science CPSC 322 Lecture 16 Logic Wrap Up Intro to Probability 1.
Computer Science cpsc322, Lecture 25
Pattern Recognition Probability Review
Marginal Independence and Conditional Independence
Finish Logics… Reasoning under Uncertainty: Intro to Probability
Chapter 10: Using Uncertain Knowledge
Where are we in CS 440? Now leaving: sequential, deterministic reasoning Entering: probabilistic reasoning and machine learning.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Reasoning Under Uncertainty: Conditioning, Bayes Rule & Chain Rule
Uncertainty.
Uncertainty in Environments
Probability Topics Random Variables Joint and Marginal Distributions
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Presentation transcript:

Computer Science CPSC 322 Lecture 27 Conditioning Ch Slide 1

Lecture Overview Recap Lecture 26 Conditioning Inference by Enumeration Bayes Rule Chain Rule (time permitting) Slide 2

Probability as a measure of uncertainty/ignorance Probability measures an agent's degree of belief in truth of propositions about states of the world Belief in a proposition f can be measured in terms of a number between 0 and 1 this is the probability of f E.g. P(“roll of fair die came out as a 6”) = 1/6 ≈ 16.7% = P(f) = 0 means that f is believed to be definitely false P(f) = 1 means f is believed to be definitely true Using probabilities between 0 and 1 is purely a convention.. Slide 3

Probability Theory and Random Variables Probability Theory system of logical axioms and formal operations for sound reasoning under uncertainty Basic element: random variable X X is a variable like the ones we have seen in CSP/Planning/Logic but the agent can be uncertain about the value of X We will focus on Boolean and categorical variables Boolean: e.g., Cancer (does the patient have cancer or not?) Categorical: e.g., CancerType could be one of {breastCancer, lungCancer, skinMelanomas} Slide 4

Possible Worlds A possible world specifies an assignment to each random variable Example: weather in Vancouver, represented by random variables Temperature: {hot mild cold} - Weather: {sunny, cloudy} There are 6 possible worlds : w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w Because for sure we are in one of these worlds! WeatherTemperature w1sunnyhot w2sunnymild w3sunnycold w4cloudyhot w5cloudymild w6cloudycold Slide 5

Possible Worlds A possible world specifies an assignment to each random variable Example: weather in Vancouver, represented by random variables Temperature: {hot mild cold} - Weather: {sunny, cloudy} There are 6 possible worlds : w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w - The probability of proposition f is defined by: P(f )= Σ w╞ f µ(w ). i.e. sum of the probabilities of the worlds w in which f is true Because for sure we are in one of these worlds! WeatherTemperatureµ(w) w1sunnyhot0.10 w2sunnymild0.20 w3sunnycold0.10 w4cloudyhot0.05 w5cloudymild0.35 w6cloudycold0.20 Slide 6

Example What’s the probability of it being cloudy or cold? µ(w3) + µ(w4) + µ(w5) + µ(w6) = 0.7 WeatherTemperatureµ(w) w1sunnyhot0.10 w2sunnymild0.20 w3sunnycold0.10 w4cloudyhot0.05 w5cloudymild0.35 w6cloudycold0.20 Remember - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w) - sum of the probabilities of the worlds w in which f is true Slide 7

Joint Probability Distribution Joint distribution over random variables X 1, …, X n : a probability distribution over the joint random variable with domain dom(X 1 ) × … × dom(X n ) (the Cartesian product) Think of a joint distribution over n variables as the n-dimensional table of the corresponding possible worlds Each row corresponds to an assignment X 1 = x 1, …, X n = x n and its probability P(X 1 = x 1, …,X n = x n ) E.g., {Weather, Temperature} example WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 Definition (probability distribution) A probability distribution P on a random variable X is a function dom(X)  [0,1] such that x  P(X=x) Slide 8

Marginalization Weatherµ(w) sunny0.40 cloudy WindWeatherTemperatureµ(w) yessunnyhot0.04 yessunnymild0.09 yessunnycold0.07 yescloudyhot0.01 yescloudymild0.10 yescloudycold0.12 nosunnyhot0.06 nosunnymild0.11 nosunnycold0.03 nocloudyhot0.04 nocloudymild0.25 nocloudycold0.08 Marginalization over Temperature and Wind Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: P(X=x,Y=y) =  z 1  dom(Z 1 ),…, z n  dom(Z n ) P(X=x, Y=y, Z 1 = z 1, …, Z n = z n ) Slide 9

Lecture Overview Recap Lecture 26 Conditioning Inference by Enumeration Bayes Rule Chain Rule (time permitting) Slide 10

Conditioning Are we done with reasoning under uncertainty? What can happen? Remember from last class I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’? It is 1/6 ≈ 16.7%. I now look at the dice. What is ‘the’ (my) probability now? My probability is now either 1 or 0, depending on what I observed. Your probability hasn’t changed: 1/6 ≈ 16.7% What if I tell some of you the result is even? Their probability increases to 1/3 ≈ 33.3%, if they believe me Different agents can have different degrees of belief in (probabilities for) a proposition, based on the evidence they have. Slide 11

Conditioning Conditioning: revise beliefs based on new observations Build a probabilistic model (the joint probability distribution, JPD) Takes into account all background information Called the prior probability distribution Denote the prior probability for hypothesis h as P(h) Observe new information about the world Call all information we received subsequently the evidence e Integrate the two sources of information to compute the conditional probability P(h|e) This is also called the posterior probability of h given e. Example Prior probability for having a disease (typically small) Evidence: a test for the disease comes out positive But diagnostic tests have false positives Posterior probability: integrate prior and evidence Slide 12

Example for conditioning You have a prior for the joint distribution of weather and temperature Possible world WeatherTemperatureµ(w) w1w1 sunnyhot0.10 w2w2 sunnymild0.20 w3w3 sunnycold0.10 w4w4 cloudyhot0.05 w5w5 cloudymild0.35 w6w6 cloudycold0.20 TP(T|W=sunny) hot 0.10/0.40=0.25 mild cold C A. 0.20B D ?? Now, you look outside and see that it’s sunny You are now certain that you’re in one of worlds w 1, w 2, or w 3 To get the conditional probability P(T|W=sunny) renormalize µ(w 1 ), µ(w 2 ), µ(w 3 ) to sum to 1 µ(w 1 ) + µ(w 2 ) + µ(w 3 ) = =0.40 Slide 13

How can we compute P(h|e) What happens in term of possible worlds if we know the value of a random var (or a set of random vars)? cavitytoothachecatch µ(w)µ e (w) TTT.108 TTF.012 TFT.072 TFF.008 FTT.016 FTF.064 FFT.144 FFF.576 e = (cavity = T) Some worlds are. The other become …. Slide 14

How can we compute P(h|e) cavitytoothachecatch µ(w)µ cavity=T (w) TTT.108 TTF.012 TFT.072 TFF.008 FTT.016 FTF.064 FFT.144 FFF.576 Slide 15

Semantics of Conditional Probability The conditional probability of formula h given evidence e is Slide 16

Semantics of Conditional Prob.: Example cavitytoothachecatch µ(w)µ e (w) TTT TTF TFT TFF FTT.0160 FTF.0640 FFT.1440 FFF.5760 e = (cavity = T) P(h | e) = P(toothache = T | cavity = T) = 17

Lecture Overview Recap Lecture 26 Conditioning Inference by Enumeration Bayes Rule Chain Rule (time permitting) Slide 18

Inference by Enumeration Great, we can compute arbitrary probabilities now! Given: Prior joint probability distribution (JPD) on set of variables X specific values e for the evidence variables E (subset of X) We want to compute: posterior joint distribution of query variables Y (a subset of X) given evidence e Step 1: Condition to get distribution P(X|e) Step 2: Marginalize to get distribution P(Y|e) Slide 19

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 1: condition to get distribution P(X|e) Windy W Cloudy C Temperature T P(W, C, T) yesno hot0.04 yesno mild0.09 yesno cold0.07 yes hot0.01 yes mild0.10 yes cold0.12 no hot0.06 no mild0.11 no cold0.03 noyes hot0.04 noyes mild0.25 noyes cold0.08 Slide 20

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 1: condition to get distribution P(X|e) P(X|e) Cloudy C Temperature T P(C, T| W=yes) sunny hot sunny mild sunny cold cloudy hot cloudy mild cloudy cold Windy W Cloudy C Temperature T P(W, C, T) yesno hot0.04 yesno mild0.09 yesno cold0.07 yes hot0.01 yes mild0.10 yes cold0.12 no hot0.06 no mild0.11 no cold0.03 noyes hot0.04 noyes mild0.25 noyes cold0.08 Slide 21

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 1: condition to get distribution P(X|e) Cloudy C Temperature T P(C, T| W=yes) sunny hot 0.04/0.43  0.10 sunny mild 0.09/0.43  0.21 sunny cold 0.07/0.43  0.16 cloudy hot 0.01/0.43  0.02 cloudy mild 0.10/0.43  0.23 cloudy cold 0.12/0.43  0.28 Windy W Cloudy C Temperature T P(W, C, T) yesno hot0.04 yesno mild0.09 yesno cold0.07 yes hot0.01 yes mild0.10 yes cold0.12 no hot0.06 no mild0.11 no cold0.03 noyes hot0.04 noyes mild0.25 noyes cold0.08 Slide 22

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 2: marginalize to get distribution P(Y|e) Cloudy C Temperature T P(C, T| W=yes) sunny hot 0.10 sunny mild 0.21 sunny cold 0.16 cloudy hot 0.02 cloudy mild 0.23 cloudy cold 0.28 Temperature T P(T| W=yes) hot = 0.12 mild = 0.44 cold = 0.44 Slide 23

Problems of Inference by Enumeration If we have n variables, and d is the size of the largest domain What is the space complexity to store the joint distribution? B. O(n d ) A. O(d n ) C. O(nd) D. O(n+d) Slide 24

Problems of Inference by Enumeration If we have n variables, and d is the size of the largest domain What is the space complexity to store the joint distribution? We need to store the probability for each possible world There are O(d n ) possible worlds, so the space complexity is O(d n ) How do we find the numbers for O(d n ) entries? Time complexity O(d n ) In the worse case, need to sum over all entries in the JPD We have some of our basic tools, but to gain computational efficiency we need to do more We will exploit (conditional) independence between variables But first, let’s look at a neat application of conditioning Slide 25

Lecture Overview Recap Lecture 26 Conditioning Inference by Enumeration Bayes Rule Chain Rule (time permitting) Slide 26

Using conditional probability Often you have causal knowledge (forward from cause to evidence): For example P(symptom | disease) P(light is off | status of switches and switch positions) P(alarm | fire) In general: P(evidence e | hypothesis h)... and you want to do evidential reasoning (backwards from evidence to cause): For example P(disease | symptom) P(status of switches | light is off and switch positions) P(fire | alarm) In general: P(hypothesis h | evidence e) Slide 27

Bayes Rule By definition, we know that : We can rearrange terms to write But From (1) (2) and (3) we can derive Slide 28

Define, use and prove the formula to compute conditional probability P(h|e) Use inference by enumeration to compute joint posterior probability distributions over any subset of variables given evidence Define Bayes Rule Learning Goals For Today’s Class Slide 29

Lecture Overview Recap Lecture 27 Conditioning Inference by Enumeration Bayes Rule Chain Rule (time permitting) Slide 30

Product Rule By definition, we know that : We can rewrite this to In general Slide 31

Chain Rule Slide 32

Chain Rule Slide 33

Chain Rule Slide 34

Chain Rule Slide 35

Chain Rule Slide 36

Why does the chain rule help us? We will see how, under specific circumstances (variables independence), this rule helps gain compactness We can represent the JPD as a product of marginal distributions We can simplify some terms when the variables involved are independent or conditionally independent Slide 37