CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty

CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty
Jiang Bian, Fall 2012 University of Arkansas at Little Rock

Chapter 13: Uncertainty Outline Uncertainty Probability
Syntax and Semantics Inference Independence and Bayes' Rule

Uncertainty Let action At = leave for airport t minutes before flight
Will At get me there on time? Problems: partial observability (road state, other drivers' plans, etc.) noisy sensors (traffic reports) uncertainty in action outcomes (flat tire, etc.) immense complexity of modeling and predicting traffic Hence a purely logical approach either risks falsehood: “A25 will get me there on time”, or leads to conclusions that are too weak for decision making: “A25 will get me there on time, if there‘s no accident on the bridge and it doesn’t rain and my tires remain intact etc etc.” (A1440 might reasonably be said to get me there on time but I'd have to stay overnight in the airport …)

Methods for handling uncertainty
Default or nonmonotonic logic: Assume my car does not have a flat tire Assume A25 works unless contradicted by evidence Issues: What assumptions are reasonable? How to handle contradiction? Rules with fudge factors: A25 |→0.3 get there on time Sprinkler |→ 0.99 WetGrass WetGrass |→ 0.7 Rain Issues: Problems with combination, e.g., Sprinkler causes Rain?? Probability A25 will get me there on time with probability 0.04

Probability Probabilistic assertions summarize effects of
ignorance: lack of relevant facts, initial conditions, etc. Subjective probability: Probabilities relate propositions to agent's own state of knowledge e.g., P(A25 | no reported accidents) = 0.06 These are not assertions about the world Probabilities of propositions change with new evidence: e.g., P(A25 | no reported accidents, 5 a.m.) = 0.15

Bayes Network: Example
BATTERY AGE ALTERNATOR BROKEN FAN-BLET BROKEN BATTERY DEAD NOT CHARGING BATTERY METER BATTERY FLAT NO OIL NO GAS FUEL LINE BLOCKED STARTER BROKEN Let me give you a flavor of a Bayes network using an example. Suppose you find in the morning that your car won't start. Well, there's many causes why your car might not start. One is that your battery is flat. Even for a flat battery there is multiple causes. One, it's just plain dead, and one is that the battery is okay but it's not charging. The reason why a battery might not charge is that the alternator might be broken or the fan belt might be broken. If you look at this influence diagram, also called a Bayes network, you'll find there's many different ways to explain that the car won't start. And a natural question you might have is, “Can we diagnose the problem?” One diagnostic tool is a battery meter, which may increase or decrease your belief that the battery may cause your car failure. You might also know your battery age. Older batteries tend to go dead more often. And there's many other ways to look at reasons why the car might not start. You might inspect the lights, the oil light, the gas gauge. You might even dip into the engine to see what the oil level is with a dipstick. All of those relate to alternative reasons why the car might not be starting, like no oil, no gas, the fuel line might be blocked, or the starter may be broken. And all of these can influence your measurements, like the oil light or the gas gauge, in different ways. For example, the battery flat would have an effect on the lights. It might have an effect on the oil light and on the gas gauge, but it won't really affect the oil you measure with the dipstick. That is affected by the actual oil level, which also affects the oil light. Gas will affect the gas gauge, and of course without gas the car doesn't start. So this is a complicated structure that really describes one way to understand how a car doesn't start. LIGHTS OIL LIGHT GAS GAUGE CAR WON’T START DIP STICK

BATTERY AGE ALTERNATOR BROKEN FAN-BLET BROKEN BATTERY DEAD NOT CHARGING BATTERY METER BATTERY FLAT NO OIL NO GAS FUEL LINE BLOCKED STARTER BROKEN A car is a complex system. It has lots of variables you can't really measure immediately, and it has sensors which allow you to understand a little bit about the state of the car. What the Bayes network does, it really assists you in reasoning from observable variables, like the car won't start and the value of the dipstick, to hidden causes, like is the fan belt broken or is the battery dead. What you have here is a Bayes network. A Bayes network is composed of nodes. These nodes correspond to events that you might or might not know that are typically called random variables. These nodes are linked by arcs, and the arcs suggest that a child of an arc is influenced by its parent but not in a deterministic way. It might be influenced in a probabilistic way, which means an older battery, for example, has a higher chance of causing the battery to be dead, but it's not clear that every old battery is dead. There is a total of 16 variables in this Bayes network. What the graph structure and associated probabilities specify is a huge probability distribution in the space of all of these 16 variables. If they are all binary, which we'll assume throughout this unit, they can take 2 to the 16th different values, which is a lot. STATE SPACE. The Bayes network, as we find out, is a compact representation of a distribution over this very, very large joint probability distribution of all of these variables. Further, once we specify the Bayes network, we can observe, for example, the car won't start. We can observe things like the oil light and the lights and the battery meter and then compute probabilities of the hypothesis, like the alternator is broken or the fan belt is broken or the battery is dead. So in this class we're going to talk about how to construct this Bayes network, what the semantics are, and how to reason in this Bayes network to find out about variables we can't observe, like whether the fan belt is broken or not. That's an overview. LIGHTS OIL LIGHT GAS GAUGE CAR WON’T START DIP STICK

Probabilities: Coin Flip
Suppose the probability for heads is 0.5. What's the probability for it coming up tails? P(H) = 1/2 P(T) = __?__ Probabilities are the cornerstone of artificial intelligence. So I'm going to start with some very basic questions, and we're going to work our way up from there. Here is a coin. The coin can come up heads or tails, and my question is the following:

Suppose the probability for heads is 1/4. What's the probability for it coming up tails? P(H) = 1/4 P(T) = __?__

Suppose the probability for heads is 1/2. Each of the coin flip is independent. What's the probability for it coming up three heads in a row? P(H) = 1/2 P(H, H, H) = __?__ 1/8

Xi = result of i-th coin flig; Xi = {H, T}; and Pi(H) = 1/2 ∀i P(X1=X2=X3=X4) = __?__ 1/8, P(X1=X2=X3=X4=H) + P(X1=X2=X3=X4=T) = 1/8

Xi = result of i-th coin flig; Xi = {H, T}; and Pi(H) = 1/2 ∀i P(X1=X2=X3=X4) = __?__ P(X1=X2=X3=X4=H) + P(X1=X2=X3=X4=T) = 1/8 1/8, P(X1=X2=X3=X4=H) + P(X1=X2=X3=X4=T) = 1/8

Xi = result of i-th coin flig; Xi = {H, T}; and Pi(H) = 1/2 ∀i P({X1,X2,X3,X4} contains at least 3 H) = __?__ HHHH, 1/16 HHHT, HHTH, HTHH, THHH 1/16 * 5 = 5/16

Xi = result of i-th coin flig; Xi = {H, T}; and Pi(H) = 1/2 ∀i P({X1,X2,X3,X4} contains at least 3 H) = __?__ P(HHHH) + P(HHHT) + P(HHTH) + P(HTHH) + P(THHH) = 5*1/16 = 5/16 HHHH, 1/16 HHHT, HHTH, HTHH, THHH 1/16 * 5 = 5/16

Probabilities: Summary
Complementary probability: P(A) = p; then P(¬A) = 1 – p Independence: X⊥Y; then P(X)P(Y) = P(X,Y) If an event has a certain probability, p, the complementary event has the probability 1-p. If 2 random variables, X and Y, are independent, that means the probability of the joint that any 2 variables can assume is the product of the marginal. marginal joint probability

Dependence Given, P(X1=H)= ½ P(X2=H) = __?__ H: P(X2=H|X1=H) = 0.9
T: P(X2=T|X1=T) = 0.8 P(X2=H) = __?__

Dependence Given, P(X1=H)= ½ P(X2=H) = __?__ H: P(X2=H|X1=H) = 0.9
T: P(X2=T|X1=T) = 0.8 P(X2=H) = __?__ P(X2=H|X1=H) * P(X1=H) + P(X2=H|X1=T) * P(X1=T) = 0.9 * ½ + (1 – 0.8) * ½ = 0.55

What we have learned? Total probability: Negation of probabilities
What about? Now, you might be tempted to say “What about the probability of X given not Y?” “Is this the same as 1 minus probability of X given Y?” And the answer is absolutely no. That's not the case. If you condition on something that has a certain probability value, you can take the event you're looking at and negate this, but you can never negate your conditional variable and assume these values add up to 1.

What we have learned? Negation of probabilities What about?
You can negate the event (X), but you can never negate the conditional variable (Y). Now, you might be tempted to say “What about the probability of X given not Y?” “Is this the same as 1 minus probability of X given Y?” And the answer is absolutely no. That's not the case. If you condition on something that has a certain probability value, you can take the event you're looking at and negate this, but you can never negate your conditional variable and assume these values add up to 1.

Example: Weather Given, P(D1); P(D1=Sunny) = 0.9
P(D2=Sunny|D1=Sunny) = 0.8 P(D2=Rainy|D1=Sunny) = ??

P(D2=Sunny|D1=Sunny) = 0.8 P(D2=Sunny|D1=Rainy) = 0.6 Assume the transition probabilities from D2 to D3 are the same: P(D2=Sunny) = 0.78; P(D2=Sunny|D1=Sunny) * P(D1=Sunny) + P(D2=Sunny|D1=Rainy) * P(D1=Rainy) = 0.8* *0.1 = 0.78 P(D3=Sunny) = 0.756 P(D3=Sunny|D2=Sunny) * P(D2=Sunny) + P(D3=Sunny|D2=Rainy) * P(D2=Rainy) = 0.8* *0.22 = 0.756 0.78

Example: Cancer There exists a type of cancer, where 1% of the population will carry the disease. P(C) = 0.01; P(¬C) = = 0.99 There exists a test of the cancer. P(+|C) = 0.9; P(-|C) = 0.1 P(+|¬C) = 0.2; P(-|¬C) = 0.8 P(C|+) = ?? Joint probabilities: P(+, C) = ??; P(-, C) = ?? P(+, ¬C) = ??; P(-, ¬C) = ??

Example: Cancer There exists a type of cancer, where 1% of the population will carry the disease. P(C) = 0.01; P(¬C) = = 0.99 There exists a test of the cancer. P(+|C) = 0.9; P(-|C) = 0.1 P(+|¬C) = 0.2; P(-|¬C) = 0.8 P(C|+) = ?? Joint probabilities: P(+, C) = 0.009; P(-, C) = 0.001 P(+, ¬C) = 0.198; P(-, ¬C) = 0.792

Example: Cancer There exists a type of cancer, where 1% of the population will carry the disease. P(C) = 0.01; P(¬C) = = 0.99 There exists a test of the cancer. P(+|C) = 0.9; P(-|C) = 0.1 P(+|¬C) = 0.2; P(-|¬C) = 0.8 P(C|+) = 0.043 P(+, C) / (P(+, C) + P(+, ¬C)) 0.009 / = 0.043 Joint probabilities: P(+, C) = 0.009; P(-, C) = 0.001 P(+, ¬C) = 0.198; P(-, ¬C) = 0.792 Now, the chance of having a positive test and having cancer is Well, I might--when I receive a positive test--have cancer or not cancer, so we will just normalize by these 2 possible causes for the positive test, which is We know both these 2 things together gets over 0.207, which is approximately Now, the interesting thing in this equation is that the chances of having seen a positive test result in the absence of cancers are still much, much higher than the chance of seeing a positive result in the presence of cancer, and that's because our prior for cancer is so small in the population that it's just very unlikely to have cancer. So, the additional information of a positive test only raised my posterior probability to

Bayes Rule LIKELILHOOD PRIOR POSTERIOR MARGINAL LIKEIIHOOD
The interesting thing here is the way the probabilities are reworded. Say we have evidence B. We know about B, but we really care about the variable A. So, for example, B is a test result. We don't care about the test result as much as we care about the fact whether we have cancer or not. This diagnostic reasoning--which is from evidence to its causes-- is turned upside down by Bayes Rule into a causal reasoning, which is given--hypothetically, if we knew the cause, what would be the probability of the evidence we just observed. But to correct for this inversion, we have to multiply by the prior of the cause to be the case in the first place, in this case, having cancer or not, and divide it by the probability of the evidence, P(B), which often is expanded using the theorem of total probability as follows. The probability of B is a sum over all probabilities of B conditional on A TOTAL PROBABILITY

Bayes Rule: Cancer Example
LIKELILHOOD PRIOR POSTERIOR MARGINAL LIKEIIHOOD So, let's apply this to the cancer case and say we really care about whether you have cancer, which is our cause, conditioned on the evidence that is the result of this hidden cause, in this case, a positive test result. Our likelihood is the probability of seeing a positive test result given that you have cancer multiplied by the prior probability of having cancer over the probability of the positive test result, and that is--according to the tables we looked at before times a prior of 0.01 over-- now we're going to expand this right over here according to total probability which gives us 0.9 times 0.01. That's the probability of + given that we do have cancer. So, the probability of + given that we don't have cancer is 0.2, but the prior here is 0.99. So, if we plug in the numbers we know about, we get over That is approximately , which is the number we saw before.

Bayes Network Graphically, Diagnostic reasoning: P(A|B) or P(A|¬B)
How many parameters?? A Not observable P(A) B observable P(B|A) P(B|¬A) A: Whether we have cancer B: cancer test P(A) prior probability We know that A causes B (whether we have cancer causes the test results to be positive or not), with some randomness involved We are interested in diagnostic reasoning, which is the inversion of the cause reasoning

Two test cancer example
P(C|T1=+,T2=+) = P(C|++) = ?? C P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 T1 T2

P(C|T1=+,T2=+) = P(C|++) = C P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 T1 T2

Bayes Rule: Compute

P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(C|++) = ?? C Prior + P’ C 0.01 0.9 0.0081 ¬C 0.99 0.2 0.0396 T1 T2

P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(C|+-) = ?? C T1 T2

P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(C|+-) = C T1 T2 Prior + - P’ P C 0.01 0.9 0.1 0.0009 0.0056 ¬C 0.99 0.2 0.8 0.1584 0.9943

Conditional Independence
2-Test Cancer Example We not only assume that T1 and T2 are identically distributed; but also conditionally independent. P(T2|C,T1)=P(T2|C) P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 C T1 T2 If you look at the diagram, C separately causes T1 and T2.

Given A, B ⊥C B ⊥C|A =? B ⊥C A B C If you look at the diagram, C separately causes T1 and T2.

Given A, B ⊥C B ⊥C|A =? B ⊥C Intuitively, getting a positive test result about cancer gives us information about whether you have cancer or not. So if you get a positive test result you're going to raise the probability of having cancer relative to the prior probability. With that increased probability we will predict that another test will with a higher likelihood give us a positive response than if we hadn't taken the previous test. A B C

P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 P(T2=+|T1=+) = ?? C T1 T2

Conditional independence: cancer example
2-Test Cancer Example Conditional independence: given that I know C, knowledge of the first test gives me no more information about the second test. It only gives me information if C was unknown. P(C) = 0.01; P(¬C) = 0.99 P(+|C) = 0.9; P(-|C) = 0.1 P(-|¬C) = 0.8; P(+|¬C) = 0.2 C T1 T2 P(T2=+|T1=+) = P(+2|+1,C)P(C|+1) + P(+2 |+1, ¬C)P(¬C|+1) = P(+2|C) P(+2 |¬C)( ) = 0.9 * * 0.957 = It’s the same total probability, but all conditioned on +1. if my first test comes in positive, I expect my second test to be positive with probably That's an increased probability to the default probability, which we calculated before, which is the probability of any test, test 2 come in as positive before was the normalizer of Bayes rule which was So, my first test has a 20% chance of coming in positive. My second test, after seeing a positive test, has now an increased probability of about 23% of coming in positive. P(+2|+1,C) = P(+2|C)

Absolute and Conditional
A ⊥B|C A ⊥B => A ⊥B | C ?? A ⊥B | C => A ⊥B ??

A ⊥B|C A ⊥B => A ⊥B | C ?? We already saw that conditional independence, as shown over here, doesn't give us absolute independence. So, for example, this is test #1 and test #2. You might or might not have cancer. Our first test gives us information about whether you have cancer or not. As a result, we've changed our prior probability for the second test to come in positive. That means that conditional independence does not imply absolute independence A ⊥B | C => A ⊥B ??

Confounding Cause C A B S R H P(S) = 0.7 P(R) = 0.01 P(H|S, R) = 1
P(R|S) = ??

Confounding Cause C A B S R H P(S) = 0.7 P(R) = 0.01 P(H|S, R) = 1
Since I don’t know anything about my happiness, so S and R are independent. P(R|S) = 0.01

Explaining Away H S R P(H|S, R) = 1 P(H|¬S, R) = 0.9 P(H|S, ¬R) = 0.7
P(S) = 0.7 P(R) = 0.01 P(R|H,S) = ?? Explaining away means that if we know that we are happy, then sunny weather can explain away the cause of happiness. - If I know that it’s sunny, it becomes less likely that I received a raise. If we see a certain effect that could be caused by multiple causes, seeing one of those causes can explain away any other potential cause of this effect over here. For example, if you ask me why am I happy today? Is it sunny or did I get a raise? And if you then look outside and see that it is sunny, then you might explain to yourself. “Well, Jiang is happy because it is sunny.” Which makes it effectively less likely that I got a raise, because you could already explain my happiness by it being sunny.

Conditional Dependence
H S R P(H|S, R) = 1 P(H|¬S, R) = 0.9 P(H|S, ¬R) = 0.7 P(H|¬S, ¬R) = 0.1 P(S) = 0.7 P(R) = 0.01 P(R|H,S) = 1.42% ≠ P(R|H) = 1.849% P(R|S) = 0.01 = P(R) since R⊥S P(R|H, ¬S) = 8.33% In both cases, we observer that I am happen, and ask the same question the probability of me getting a raise. The difference is in case, you observer that it’s sunny, and in the other case it is not. The sunniness well explain the reason of my happiness and reduces the probability of me getting a raise from (8.33% to 1.42%) H adds a dependence between two independent events S and R. R⊥S R⊥S | H Independence does not imply conditional independence!!!

A ⊥B | C => A ⊥B ?? C A B A ⊥B => A ⊥B | C ??

Bayes Networks C D E A B Bayes networks define probability distributions over graphs of random variables. Instead of enumerating all possibilities of the combinations of random variables, the Bayes network is defined by probability distributions that are inherent to each individual node. The joint probability represented by a Bayes network is the product of various Bayes network probabilities that are defined over individual nodes where each node's probability is only conditioned on the incoming arcs. P(A)P(B) P(C|A,B) P(D|C) P(E|C) 10 probability values P(A), P(B) P(C|A,B) So, let’s define Bayes Networks in a more general way. So the compactness of the Bayes network leads to a representation that scales significantly better to large networks than the combinatorial approach which goes through all combinations of variable values. P(D|C) P(E|C) 25-1 = 31 probability values

Bayes Network: Quiz 1 A How many probability values are required to specific this Bayes network? B C D E F

Bayes Network: Quiz 1 A P(A)
How many probability values are required to specific this Bayes network? 13 B C D P(B|A), P(C|A), P(D|A) E F P(E|B), P(F|C,D)

Bayes Network: Quiz 2 A B C
How many probability values are required to specific this Bayes network? D E F G

Bayes Network: Quiz 2 19 A B C E F G P(A), P(B), P(C) = 3 D
P(D|A, B, C) = 8 P(E|D), P(F|D), P(G|D, C) = = 8

BATTERY AGE ALTERNATOR BROKEN FAN-BLET BROKEN BATTERY DEAD NOT CHARGING BATTERY METER BATTERY FLAT NO OIL NO GAS FUEL LINE BLOCKED STARTER BROKEN LIGHTS OIL LIGHT GAS GAUGE CAR WON’T START DIP STICK 216-1 = probability values

1 1 1 BATTERY AGE ALTERNATOR BROKEN FAN-BLET BROKEN BATTERY DEAD 2 NOT CHARGING 4 4 1 1 1 1 BATTERY METER BATTERY FLAT NO OIL NO GAS FUEL LINE BLOCKED STARTER BROKEN 2 LIGHTS OIL LIGHT GAS GAUGE CAR WON’T START DIP STICK 2 4 4 2 16 216-1 = probability values 47

D-Separation Yes No C⊥A C⊥A | B C⊥D C⊥D | A E⊥C | D A B D C E

D-Separation Yes No C⊥A X A influences C by virtue of B. C⊥A | B
If you know B, the knowledge of A won’t tell you anything about C. C⊥D If I know D, I can infer more about C through A. C⊥D | A E⊥C | D A B D any 2 variables are independent if they are not linked by just unknown variables. So for example, if we know B, everything downstream of B becomes independent of anything upstream of B. E is now independent of C, conditioned on B. However, knowledge of B does not render A and E independent. C E C and A are not independent but C and A are independent given B. C and D are not independent but C and D are independent given A. E and C are independent given D.

D-Separation C D E A B Yes No A⊥E A⊥E | B A⊥E | C A⊥B A⊥B | C

D-Separation C D E A B Yes No A⊥E X A⊥E | B A⊥E | C A⊥B A⊥B | C
EXPLAINING AWAY EFFECT The knowledge of A will discredit the information given by B on its influence on C.

D-Separation: Reachability
Active triplets: render variables dependent Inactive triplets: render variables independent cut off by a known variable in the middle, that separates or d-separates the left variable from the right variable, and they become independent.

D-Separation: Reachability
Active triplets In-active triplets

D-Separation: Quiz A C B D F E G H Yes No F⊥A F⊥A | D F⊥A | G F⊥A | H

D-Separation: Quiz A C B D F E G H Yes No F⊥A X F⊥A | D F⊥A | G
F⊥A | H

Bayes Network: Summary
Graph Structure Compact representation Conditional Independence Next: applications

CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty

Similar presentations

Presentation on theme: "CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty

Similar presentations

Presentation on theme: "CPSC 7373: Artificial Intelligence Lecture 4: Uncertainty"— Presentation transcript:

Similar presentations

About project

Feedback