Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1

Lecture Overview Recap Lecture 26 Conditioning Inference by Enumeration Bayes Rule Chain Rule (time permitting) Slide 2

Probability as a measure of uncertainty/ignorance Probability measures an agent's degree of belief in truth of propositions about states of the world Belief in a proposition f can be measured in terms of a number between 0 and 1 this is the probability of f E.g. P(“roll of fair die came out as a 6”) = 1/6 ≈ 16.7% = 0.167 P(f) = 0 means that f is believed to be definitely false P(f) = 1 means f is believed to be definitely true Using probabilities between 0 and 1 is purely a convention.. Slide 3

Probability Theory and Random Variables Probability Theory system of logical axioms and formal operations for sound reasoning under uncertainty Basic element: random variable X X is a variable like the ones we have seen in CSP/Planning/Logic but the agent can be uncertain about the value of X We will focus on Boolean and categorical variables Boolean: e.g., Cancer (does the patient have cancer or not?) Categorical: e.g., CancerType could be one of {breastCancer, lungCancer, skinMelanomas} Slide 4

Possible Worlds A possible world specifies an assignment to each random variable Example: weather in Vancouver, represented by random variables - - - Temperature: {hot mild cold} - Weather: {sunny, cloudy} There are 6 possible worlds : w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w Because for sure we are in one of these worlds! WeatherTemperature w1sunnyhot w2sunnymild w3sunnycold w4cloudyhot w5cloudymild w6cloudycold Slide 5

Possible Worlds A possible world specifies an assignment to each random variable Example: weather in Vancouver, represented by random variables - - - Temperature: {hot mild cold} - Weather: {sunny, cloudy} There are 6 possible worlds : w╞ f means that proposition f is true in world w A probability measure  (w) over possible worlds w is a nonnegative real number such that -  (w) sums to 1 over all possible worlds w - The probability of proposition f is defined by: P(f )= Σ w╞ f µ(w ). i.e. sum of the probabilities of the worlds w in which f is true Because for sure we are in one of these worlds! WeatherTemperatureµ(w) w1sunnyhot0.10 w2sunnymild0.20 w3sunnycold0.10 w4cloudyhot0.05 w5cloudymild0.35 w6cloudycold0.20 Slide 6

Example What’s the probability of it being cloudy or cold? µ(w3) + µ(w4) + µ(w5) + µ(w6) = 0.7 WeatherTemperatureµ(w) w1sunnyhot0.10 w2sunnymild0.20 w3sunnycold0.10 w4cloudyhot0.05 w5cloudymild0.35 w6cloudycold0.20 Remember - The probability of proposition f is defined by: P(f )=Σ w╞ f µ(w) - sum of the probabilities of the worlds w in which f is true Slide 7

Joint Probability Distribution Joint distribution over random variables X 1, …, X n : a probability distribution over the joint random variable with domain dom(X 1 ) × … × dom(X n ) (the Cartesian product) Think of a joint distribution over n variables as the n-dimensional table of the corresponding possible worlds Each row corresponds to an assignment X 1 = x 1, …, X n = x n and its probability P(X 1 = x 1, …,X n = x n ) E.g., {Weather, Temperature} example WeatherTemperatureµ(w) sunnyhot0.10 sunnymild0.20 sunnycold0.10 cloudyhot0.05 cloudymild0.35 cloudycold0.20 Definition (probability distribution) A probability distribution P on a random variable X is a function dom(X)  [0,1] such that x  P(X=x) Slide 8

Marginalization Weatherµ(w) sunny0.40 cloudy WindWeatherTemperatureµ(w) yessunnyhot0.04 yessunnymild0.09 yessunnycold0.07 yescloudyhot0.01 yescloudymild0.10 yescloudycold0.12 nosunnyhot0.06 nosunnymild0.11 nosunnycold0.03 nocloudyhot0.04 nocloudymild0.25 nocloudycold0.08 Marginalization over Temperature and Wind Given the joint distribution, we can compute distributions over subsets of the variables through marginalization: P(X=x,Y=y) =  z 1  dom(Z 1 ),…, z n  dom(Z n ) P(X=x, Y=y, Z 1 = z 1, …, Z n = z n ) Slide 9

Conditioning Are we done with reasoning under uncertainty? What can happen? Remember from last class I roll a fair dice. What is ‘the’ (my) probability that the result is a ‘6’? It is 1/6 ≈ 16.7%. I now look at the dice. What is ‘the’ (my) probability now? My probability is now either 1 or 0, depending on what I observed. Your probability hasn’t changed: 1/6 ≈ 16.7% What if I tell some of you the result is even? Their probability increases to 1/3 ≈ 33.3%, if they believe me Different agents can have different degrees of belief in (probabilities for) a proposition, based on the evidence they have. Slide 11

Conditioning Conditioning: revise beliefs based on new observations Build a probabilistic model (the joint probability distribution, JPD) Takes into account all background information Called the prior probability distribution Denote the prior probability for hypothesis h as P(h) Observe new information about the world Call all information we received subsequently the evidence e Integrate the two sources of information to compute the conditional probability P(h|e) This is also called the posterior probability of h given e. Example Prior probability for having a disease (typically small) Evidence: a test for the disease comes out positive But diagnostic tests have false positives Posterior probability: integrate prior and evidence Slide 12

Example for conditioning You have a prior for the joint distribution of weather and temperature Possible world WeatherTemperatureµ(w) w1w1 sunnyhot0.10 w2w2 sunnymild0.20 w3w3 sunnycold0.10 w4w4 cloudyhot0.05 w5w5 cloudymild0.35 w6w6 cloudycold0.20 TP(T|W=sunny) hot 0.10/0.40=0.25 mild cold C. 0.40 A. 0.20B. 0.50 D. 0.80 ?? Now, you look outside and see that it’s sunny You are now certain that you’re in one of worlds w 1, w 2, or w 3 To get the conditional probability P(T|W=sunny) renormalize µ(w 1 ), µ(w 2 ), µ(w 3 ) to sum to 1 µ(w 1 ) + µ(w 2 ) + µ(w 3 ) = 0.10+0.20+0.10=0.40 Slide 13

How can we compute P(h|e) What happens in term of possible worlds if we know the value of a random var (or a set of random vars)? cavitytoothachecatch µ(w)µ e (w) TTT.108 TTF.012 TFT.072 TFF.008 FTT.016 FTF.064 FFT.144 FFF.576 e = (cavity = T) Some worlds are. The other become …. Slide 14

How can we compute P(h|e) cavitytoothachecatch µ(w)µ cavity=T (w) TTT.108 TTF.012 TFT.072 TFF.008 FTT.016 FTF.064 FFT.144 FFF.576 Slide 15

Semantics of Conditional Probability The conditional probability of formula h given evidence e is Slide 16

Semantics of Conditional Prob.: Example cavitytoothachecatch µ(w)µ e (w) TTT.108.54 TTF.012.06 TFT.072.36 TFF.008.04 FTT.0160 FTF.0640 FFT.1440 FFF.5760 e = (cavity = T) P(h | e) = P(toothache = T | cavity = T) = 17

Inference by Enumeration Great, we can compute arbitrary probabilities now! Given: Prior joint probability distribution (JPD) on set of variables X specific values e for the evidence variables E (subset of X) We want to compute: posterior joint distribution of query variables Y (a subset of X) given evidence e Step 1: Condition to get distribution P(X|e) Step 2: Marginalize to get distribution P(Y|e) Slide 19

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 1: condition to get distribution P(X|e) Windy W Cloudy C Temperature T P(W, C, T) yesno hot0.04 yesno mild0.09 yesno cold0.07 yes hot0.01 yes mild0.10 yes cold0.12 no hot0.06 no mild0.11 no cold0.03 noyes hot0.04 noyes mild0.25 noyes cold0.08 Slide 20

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 1: condition to get distribution P(X|e) P(X|e) Cloudy C Temperature T P(C, T| W=yes) sunny hot sunny mild sunny cold cloudy hot cloudy mild cloudy cold Windy W Cloudy C Temperature T P(W, C, T) yesno hot0.04 yesno mild0.09 yesno cold0.07 yes hot0.01 yes mild0.10 yes cold0.12 no hot0.06 no mild0.11 no cold0.03 noyes hot0.04 noyes mild0.25 noyes cold0.08 Slide 21

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 1: condition to get distribution P(X|e) Cloudy C Temperature T P(C, T| W=yes) sunny hot 0.04/0.43  0.10 sunny mild 0.09/0.43  0.21 sunny cold 0.07/0.43  0.16 cloudy hot 0.01/0.43  0.02 cloudy mild 0.10/0.43  0.23 cloudy cold 0.12/0.43  0.28 Windy W Cloudy C Temperature T P(W, C, T) yesno hot0.04 yesno mild0.09 yesno cold0.07 yes hot0.01 yes mild0.10 yes cold0.12 no hot0.06 no mild0.11 no cold0.03 noyes hot0.04 noyes mild0.25 noyes cold0.08 Slide 22

Inference by Enumeration: example Given P(X) as JPD below, and evidence e : “Wind=yes” What is the probability that it is hot? I.e., P(Temperature=hot | Wind=yes) Step 2: marginalize to get distribution P(Y|e) Cloudy C Temperature T P(C, T| W=yes) sunny hot 0.10 sunny mild 0.21 sunny cold 0.16 cloudy hot 0.02 cloudy mild 0.23 cloudy cold 0.28 Temperature T P(T| W=yes) hot 0.10+0.02 = 0.12 mild 0.21+0.23 = 0.44 cold 0.16+0.28 = 0.44 Slide 23

Problems of Inference by Enumeration If we have n variables, and d is the size of the largest domain What is the space complexity to store the joint distribution? B. O(n d ) A. O(d n ) C. O(nd) D. O(n+d) Slide 24

Problems of Inference by Enumeration If we have n variables, and d is the size of the largest domain What is the space complexity to store the joint distribution? We need to store the probability for each possible world There are O(d n ) possible worlds, so the space complexity is O(d n ) How do we find the numbers for O(d n ) entries? Time complexity O(d n ) In the worse case, need to sum over all entries in the JPD We have some of our basic tools, but to gain computational efficiency we need to do more We will exploit (conditional) independence between variables But first, let’s look at a neat application of conditioning Slide 25

Using conditional probability Often you have causal knowledge (forward from cause to evidence): For example P(symptom | disease) P(light is off | status of switches and switch positions) P(alarm | fire) In general: P(evidence e | hypothesis h)... and you want to do evidential reasoning (backwards from evidence to cause): For example P(disease | symptom) P(status of switches | light is off and switch positions) P(fire | alarm) In general: P(hypothesis h | evidence e) Slide 27

Bayes Rule By definition, we know that : We can rearrange terms to write But From (1) (2) and (3) we can derive Slide 28

Define, use and prove the formula to compute conditional probability P(h|e) Use inference by enumeration to compute joint posterior probability distributions over any subset of variables given evidence Define Bayes Rule Learning Goals For Today’s Class Slide 29

Product Rule By definition, we know that : We can rewrite this to In general Slide 31

Chain Rule Slide 32

Chain Rule Slide 33

Chain Rule Slide 34

Chain Rule Slide 35

Chain Rule Slide 36

Why does the chain rule help us? We will see how, under specific circumstances (variables independence), this rule helps gain compactness We can represent the JPD as a product of marginal distributions We can simplify some terms when the variables involved are independent or conditionally independent Slide 37

Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Similar presentations

Presentation on theme: "Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1.

Similar presentations

Presentation on theme: "Computer Science CPSC 322 Lecture 27 Conditioning Ch 6.1.3 Slide 1."— Presentation transcript:

Similar presentations

About project

Feedback