Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reasoning with Uncertainty; Probabilistic Reasoning Foundations of Artificial Intelligence.

Similar presentations


Presentation on theme: "Reasoning with Uncertainty; Probabilistic Reasoning Foundations of Artificial Intelligence."— Presentation transcript:

1 Reasoning with Uncertainty; Probabilistic Reasoning Foundations of Artificial Intelligence

2 2 Reasoning with Uncertainty  Non-monotonic Reasoning  dealing with contradictory knowledge  non-monotonic extensions  using non-monotonic reasoning for diagnosis problems  Rules with Uncertainty Factors  MYCIN Example  Probabilistic Reasoning  Basic probabilistic inferences  Bayes’ rule and Bayesian updating of beliefs  basic Bayesian networks  Rational Agents and Uncertainty

3 Foundations of Artificial Intelligence 3 Non-Monotonic Reasoning  Need for defeasible reasoning in AI  People often draw conclusions that can be defeated by subsequent information  But, what if we find that tweety is an ostrich? We want to say that birds fly, as long as they are not abnormal birds  How do we prove that  abnormal(tweety)? We can assume tweety is normal in the absence of information to the contrary (frame axiom)  Now we can state rules that say what is considered abnormal, e.g.,  So, we can’t conclude an ostrich fred flies  but, what if fred is an abnormal ostrich with a ticket on a united airlines flight  also, how can we list all situations where birds are considered abnormal (the qualification problem)?

4 Foundations of Artificial Intelligence 4 Non-Monotonic Reasoning  Why is it called nonmonotonic?  Assumptions can be overturned by new information. Normally, if you have some set of beliefs S and S |= p, then given a larger set T with S  T, you will still have T |= p. The set of conclusions grows monotonically with the premises.  In nonmonotonic reasoning, we may have to retract some previous conclusions, e.g., the conclusion that Tweety can fly if you learn he is an ostrich.  How do you decide what to believe?  Basic idea is that you have some group of assumptions, and want to believe as many of them as possible.  Start with a set T (base theory; things you believe for sure), and a set A of assumptions. An extension is a maximal subset of A that is consistent with T  Intuitively, an extensions represent possible ways things could be, given T  Example: Diagnosis Problems  Given: a complete description of a device  Assumptions: all components are in working order  Now, if failure occurs, we must retract some of the assumptions; extensions that no longer hold, tell us something about which components failed and why

5 Foundations of Artificial Intelligence 5 Non-Monotonic Reasoning  An Example: consider the following base theory T:  Assumptions we would like to make are that both fred and tweety are normal, thus A contains the following:  abnormal(tweety) and  abnormal(fred); this allows us to conclude (for now) that both fred and tweety can fly.  But, suppose we find out that fred is an ostrich (and so, it cannot fly).  So, the sentence  abnormal(fred) cannot appear in any extension of our nonmonotonic theory  Since  abnormal(tweety) is still consistent with T, there is a unique extension:  This extension together with our base theory T, is sufficient to entail that tweety can fly (unless we later find out that tweety is also abnormal so how). T =

6 Foundations of Artificial Intelligence 6 N-M Reasoning & Diagnosis Problems  Here the predicates injured and ostrich represent possible reasons why a bird may not be able to fly (symptoms)  The set of assumptions A contains all assumptions we can make that are consistent with our knowledge base: A = {  injured(fred),  injured(tweety),  ostrich(fred),  ostrich(tweety)}  Suppose now we find out that fred can (no longer) fly. There are now two unique maximal extensions: E1 = {  injured(tweety),  ostrich(fred),  ostrich(tweety)} E2 = {  injured(fred),  injured(tweety),  ostrich(tweety)}  The complement of each of these extensions (with respect to A) allows us to diagnose the potential reasons for the problem (of fred not being able to fly).  What if we find out that tweety can no longer fly either? T =

7 Foundations of Artificial Intelligence 7 Reasoning with Uncertainty  In many domains agents require the means of reasoning with uncertain, incomplete, or vague information  Can standard logic-based approaches help?  classical logic can only help in representing knowledge that is known to be true or false  can think of statements being labeled by two truth values, true and false  non-monotonic logics provide the means of retracting some of the conclusions we believed at an earlier stage  e.g., default logic, gives us the means of labeling statement as true, false, true-by- default, and false-by-default;  upon arriving at new evidence, we may retract things that were true-by-default  but, these logics do not provide the means of reasoning with uncertainty

8 Foundations of Artificial Intelligence 8 Reasoning with Uncertainty  A Matter of Degrees!  To represent uncertainty, we need the ability to express the degree to which we believe a statement to be true or false  one method for expressing the degree of belief is by labeling statements with probabilities (or with certainty factors)  note that the underlying truth or falsity of a statement remains unchanged (it is either true or false)  the certainty factors only say something about the degree of confidence the system has in its conclusion about the truth of a statement  main questions:  how to label statements numerically  how to combine evidence from multiple sources

9 Foundations of Artificial Intelligence 9 Certainty Factors: The case of MYCIN  MYCIN  expert system for diagnosis and treatment of bacterial infections  sentences are labeled not with probabilities, but with certainty factors ranging from -1 to 1:  -1 ==> sentence is known to be false  0 ==> no belief about the sentence  1 ==> sentence is known to be true  Reasoning in MYCIN  uses a special form of Modus Ponens  note that the certainty factor for q is 0 if c is negative; this is because we can’t have enough confidence in our ability to make the conclusion with p as the premise; so the conclusion should be unaffected where cf = c.d, if c > 0 0, otherwise

10 Foundations of Artificial Intelligence 10 MYCIN (Cont.)  Conjunctive forms:  but, what does MYCIN do with something like  in this case MYCIN first finds labels for the conjunction by finding the minimum certainty factor of all the p i ’s  the justification here is: “can’t be more certain of the conjunction as a whole than we are of any of the conjuncts”  now we can apply the Modus Pones as described before  Combining Evidence  we may have multiple source of evidence for a statement p with different labels  MYCIN’s approach: given two independent certainty factors x and y for p, combine the new certainty factor for p will be: CF(x,y) = x + y - xy if x,y > 0 if x and y are opposite signs x + y + xy if x,y < 0

11 Foundations of Artificial Intelligence 11 MYCIN - An Example  Using Modus Ponens  suppose we know that (with certainty factor of 0.8) that Tom is asleep  also know (with cf = 0.7) that if Tom is asleep, then he snores  we can conclude that Tom is snoring with a certainty of 0.8 * 0.7 = 0.56  Combining Evidence  suppose that in addition we know (with cf = 0.3) that if Tom snores, then his wife will complain the next morning  now, if Tom’s wife doesn’t complain, this provides us with evidence that Tom wasn’t snoring (this contributes a cf of -0.3 to the statement that Tom was snoring)  now we have two sources of evidence to the statement: snores( Tom ): cf = 0.56 snores( Tom ): cf = -0.3  using the (third clause) in the combination rule, we obtain a new certainty factor CF(0.56, -0.3) = (.56 -.3) / (1 -.3) = 0.37  so the fact that Tom’s wife didn’t complain weakens somewhat our certainty in the conclusion that Tom was snoring

12 Foundations of Artificial Intelligence 12 Ad Hoc Methods v. Probability Theory  Choices for labels in MYCIN are not based on probability theory  although there are good theoretical justifications for these choices  But, MYCIN has been shown to perform as well as the best medical experts in the field  perhaps this suggests that the full power of probability theory is not needed  Experiment  an experiment was conducted with MYCIN by restricting the range of certainty values to a discrete set of values: {-1, -0.3, 0, +0.3, +1}  MYCIN’s performance in this case did not suffer substantially  this may suggest that (continuous) numeric labels aren’t necessary, and we can just stick to declarative labels for statements  Probabilities  however, there seems to be many applications where having full probabilistic labels is helpful (particularly in knowledge representation)  also, probability theory gives us a consistent framework for building agents that can deal with uncertainty

13 Foundations of Artificial Intelligence 13 Probabilistic Reasoning  Rationale:  The world is not divided between “normal” and “abnormal”, nor is it adversarial. Possible situations have various likelihoods (probabilities)  The agent has probabilistic beliefs  pieces of knowledge with associated probabilities (strengths)  and chooses its actions to maximize the expected value of some utility function  Basic questions:  How can the agent make probabilistic inferences?  How does the agent’s knowledge get updated in the face of new evidence?  How can the agent make decisions on which actions to take?

14 Foundations of Artificial Intelligence 14 Probabilistic Reasoning - The Basics  The probability of a proposition A is a real number P(A) between 0 and 1  Basic Axioms of Probability  P(True) = 1 and P(False) = 0  P(A  B) = P(A). P(B | A), if A and B are not independent  P(  A) = 1 - P(A)  if A  B, then P(A) = P(B)  P(A  B) = P(A) + P(B) - P(A  B)  Conditional Probabilities  P(A | B) = the conditional (posterior) probability of A given B  P(A | B) = P(A, B) / P(B)  P(A  B) = P(A, B) = P(A | B). P(B)  P(A  B) = P(A, B) = P(A). P(B), if A and B are independent  we say that A is independent of B, if P(A | B) = P(A)  A and B are independent given C if: P(A | B,C) = P(A | C)  P(A  B|C) = P(A|C) P(B|C)

15 Foundations of Artificial Intelligence 15 Probabilistic Belief  Consider a world where a dentist agent D meets with a new patient P  D is interested in only whether P has a cavity; so, a state is described with a single proposition – Cavity  Before observing P, D does not know if P has a cavity, but from years of practice, he believes Cavity with some probability p and  Cavity with probability 1-p  The proposition is now a random variable and (Cavity, p) is a probabilistic belief

16 Foundations of Artificial Intelligence 16 Probabilistic Belief State  The world has only two possible states, which are respectively described by Cavity and  Cavity  The probabilistic belief state of an agent is a probabilistic distribution over all the states that the agent thinks possible  In the dentist example, D’s belief state is: Cavity  Cavity p 1-p

17 Foundations of Artificial Intelligence 17 Probability Distributions  Random Variables  A proposition that takes the value True with probability p and False with probability (1-p) is a random variable with distribution (p,1-p)  If a bag contains balls having 3 possible colors – red, yellow, and blue – the color of a ball picked at random from the bag is a random variable with 3 possible values  The (probability) distribution of a random variable X with n values x 1, x 2, …, x n is: (p 1, p 2, …, p n ) with P(X=x i ) = p i and (  i=1 p i ) = 1 n

18 Foundations of Artificial Intelligence 18 Joint Probability Distributions  It assigns probabilities to all possible combinations of events  It can be used in conjunction with basic axioms to make inferences  The joint is an n-dimensional table where each cell contains the probability of occurrence or non-occurrence of a combination n events (i.e., we have n random variables with 2 n possible entries) - not practical for realistic problems  For example, the joint of two events P and Q with some specified probabilities may be represented as follows:  Inferring other probabilities:  Pr(Q) = 0.2 + 0.1 = 0.3  Pr(P  Q) = Pr(P) + Pr(Q) - Pr(P  Q) = 0.2 + 0.3 + 0.2 + 0.1 - 0.2 = 0.6  Pr(P | Q) = Pr(P  Q) / Pr(Q) = 0.2 / 0.3 = 0.67  In practice, we must rely on other methods to make inferences about conditional probabilities, e.g., Bayes’ Rule. Note that the sum in each column or row represents the probability of an individual event (or its negation). So, the sum of two rows or two columns should add up to 1. Note that the sum in each column or row represents the probability of an individual event (or its negation). So, the sum of two rows or two columns should add up to 1.

19 Foundations of Artificial Intelligence 19 Joint Distribution - Example  k random variables X 1, …, X k  The joint distribution of these variables is a table in which each entry gives the prob. of one combination of values of X 1, …, X k  Example: P(Cavity  Toothache) P(  Cavity  Toothache) Toothache  Toothache Cavity0.040.06  Cavity0.010.89

20 Foundations of Artificial Intelligence 20 Joint Distribution - Example Toothache  Toothache Cavity0.040.06  Cavity0.010.89 P(Toothache)= P( (Toothache  Cavity) v (Toothache  Cavity) ) = P(Toothache  Cavity) + P(Toothache  Cavity) = 0.04 + 0.01 = 0.05

21 Foundations of Artificial Intelligence 21 Joint Distribution - Example Toothache  Toothache Cavity0.040.06  Cavity0.010.89 P(Toothache v Cavity) = P( (Toothache  Cavity) v (Toothache  Cavity) v (  Toothache  Cavity) ) = 0.04 + 0.01 + 0.06 = 0.11

22 Foundations of Artificial Intelligence 22 More Complex Inferences  Let’s now represent the world of the dentist D using three propositions – Cavity, Toothache, and PCatch  D’s belief state consists of 2 3 = 8 states each with some probability: {Cavity  Toothache  PCatch,  Cavity  Toothache  PCatch, Cavity  Toothache  PCatch,... }

23 Foundations of Artificial Intelligence 23 The belief state is defined by the full joint probability of the propositions PCatch  PCatch PCatch  PCatch Cavity0.1080.0120.0720.008  Cavity0.0160.0640.1440.576 Toothache  Toothache

24 Foundations of Artificial Intelligence 24 Probabilistic Inference PCatch  PCatch PCatch  PCatch Cavity0.1080.0120.0720.008  Cavity0.0160.0640.1440.576 Toothache  Toothache P(Cavity  Toothache) = 0.108 + 0.012 +... = 0.28

25 Foundations of Artificial Intelligence 25 Probabilistic Inference PCatch  PCatch PCatch  PCatch Cavity0.1080.0120.0720.008  Cavity0.0160.0640.1440.576 Toothache  Toothache P(Cavity) = 0.108 + 0.012 + 0.072 + 0.008 = 0.2

26 Foundations of Artificial Intelligence 26 Probabilistic Inference PCatch  PCatch PCatch  PCatch Cavity0.1080.0120.0720.008  Cavity0.0160.0640.1440.576 Toothache  Toothache Marginalization: P (c) =  t  pc P(c  t  pc) using the conventions that c = Cavity or  Cavity and that  t is the sum over t = {Toothache,  Toothache}

27 Foundations of Artificial Intelligence 27 PCatch  PCatch PCatch  PCatch Cavity0.1080.0120.0720.008  Cavity0.0160.0640.1440.576 Toothache  Toothache P(Cavity|Toothache) = P(Cavity  Toothache)/P(Toothache) = (0.108+0.012)/(0.108+0.012+0.016+0.064) = 0.6 Interpretation: After observing Toothache, the patient is no longer an “average” one, and the prior probabilities of Cavity is no longer valid P(Cavity|Toothache) is calculated by keeping the ratios of the probabilities of the 4 cases unchanged, and normalizing their sum to 1 Probabilistic Inference

28 Foundations of Artificial Intelligence 28 PCatch  PCatch PCatch  PCatch Cavity0.1080.0120.0720.008  Cavity0.0160.0640.1440.576 Toothache  Toothache P(Cavity|Toothache) = P(Cavity  Toothache)/P(Toothache) = (0.108+0.012)/(0.108+0.012+0.016+0.064) = 0.6 P(  Cavity|Toothache) = P(  Cavity  Toothache)/P(Toothache) = (0.016+0.064)/(0.108+0.012+0.016+0.064) = 0.4 P(c|Toochache) =  P(c  Toothache) =   pc P(c  Toothache  pc) =  [(0.108, 0.016) + (0.012, 0.064)] =  (0.12, 0.08) = (0.6, 0.4) normalization constant

29 Foundations of Artificial Intelligence 29 Basics: Bayes’ Rule  Bayes’ Rule  suppose we have two sentences, H (the hypothesis) and E (the evidence). Now from  we conclude:  Corollaries of Bayes’ Rule:  if Pr(E | H) = 0, then Pr(H | E) = 0 (E and H are mutually exclusive)  suppose that Pr(E | H 1 ) = Pr(E | H 2 ), so that the hypotheses H 1 and H 2 give the same information about a piece of evidence E; then  in other words, these assumptions imply that the evidence E will not affect the relative probabilities of H 1 and H 2

30 Foundations of Artificial Intelligence 30  Given:  P(Cavity) = 0.1  P(Toothache) = 0.05  P(Cavity|Toothache) = 0.8  Bayes’ rule tells:  P(Toothache | Cavity) = (0.8 x 0.05)/0.1 = 0.4 Bayes’ Rule - An Example cause symptom

31 Foundations of Artificial Intelligence 31 Bayes’ Rule - An Example  Traffic Light Example  Pr(green) = 0.45;Pr(red) = 0.45;Pr(yellow) = 0.1  suppose we know that the police are perfect enforcers (i.e., we get a ticket if and only if the light is red when we enter the intersection)  now we enter the intersection without getting a ticket; what are the probabilities that the light was green, red, or yellow?  Since we got no ticket, we know that the light could not have been red (in other words, we have Pr(red | no-ticket) = 0 )  also we have: Pr(no-ticket | green) = Pr(no-ticket | yellow) = 1, using Bayes’ rule we get:  similarly, we can show that Pr(green | no-ticket) = 9 / 11

32 Foundations of Artificial Intelligence 32 Bayes’ Rule - Another Example  Medical Diagnosis  suppose we know from statistical data that flu causes fever in 80% of cases, approximately 1 in 10,000 people have flu at a given time, and approximately 1 out of every 50 people is suffering from fever: Pr(fever | flu) = 0.8Pr(flu) = 0.0001Pr(fever) = 0.02  Given a patient with fever, does she have flu? Answer by applying Bayes’ rule: Pr(flu | fever) = [ Pr(fever | flu). Pr(flu) ] / Pr(fever) = 0.8 x 0.0001 / 0.02 = 0.004  Note that this is still very small, despite the strong rule: flu ==> fever. This is the impact of small prior probability Pr(flu).  Why not just get Pr(flu | fever) from statistical data?  suppose there is a sudden flu epidemic (so, Pr(flu) has gone up dramatically)  if Pr(flu | fever) was obtained from statistical data, the new information will be missed, leading to erroneous diagnosis, but we can get the correct probability using Bayes’ rule  conclusion: for diagnosis problems, it’s best to represent probabilities in terms of the estimates of the probabilities of the symptoms.

33 Foundations of Artificial Intelligence 33 Bayes’ Rule - Example (Continued)  Medical Diagnosis  In the previous example, consider the situation where we do not have any statistical information about the number people having fever, i.e., Pr(fever) is unknown. Can we still solve the problem? Pr(fever) = Pr(fever | flu). Pr(flu) + Pr(fever |  flu). Pr(  flu) = 0.8 x 0.0001 + Pr(fever |  flu) x 0.9999  if we can estimate the probability of fever given that the individual does not have flu  Relative Probabilities  Suppose we know that allergies also cause fever 60% of the time, and that about 1 in 1000 people suffer from allergies. In this case, we can still diagnose the relative likelihood of the disease (flu or allergies) without having to assess the probability of fever directly: Pr(flu | fever) = [ Pr(fever | flu). Pr(flu) ] / Pr(fever) Pr(allergies | fever) = [ Pr(fever | allergies). Pr(flu) ] / Pr(fever)  so, it’s 7.5 times more likely that the individual is suffering from allergies than from flu.

34 Foundations of Artificial Intelligence 34 Bayes’ Rule & Normalization  Direct assessments of prior probabilities for evidence (or symptoms in diagnosis problems) may not be possible  we can avoid direct assessment using normalization:  now since Pr(H | E) + Pr(  H | E) = 1, we obtain  substituting in the equation for Pr(H | E) we get:  this allows for conditional terms to sum to 1  so at the cost of assessing Pr(E |  H), we can avoid assessing Pr(E), and still obtain exact probabilities using the Bayes’ rule.

35 Foundations of Artificial Intelligence 35 Example: Normalization  Suppose A blood test is 90% effective in detecting a disease. It also falsely diagnoses that a healthy person has the disease 3% of the time. If 10% of those tested have the disease, what is the probability that a person who tests positive will actually have the disease? ( i.e. find P (disease | positive) )  P(disease) = 0.10  P(  disease) = 0.90  P(positive | disease) = 0.90  P(positive |  disease) = 0.03   P (disease | positive) = ––––––––––––––––––––––––––––––––––––––––––––––––––––  = (0.90)(0.10) / ( (0.90)(0.10) + (0.03)(0.90) )  = 0.77 P(positive | disease) * P(disease) P(positive|disease)*P(disease) + P(positive|  disease)*P(  disease)

36 Foundations of Artificial Intelligence 36 Bayes’ Rule - Updating Beliefs  Bayesian updating  as each new evidence is observed, the belief in the unknown variable is multiplied by a factor that depends on the new evidence  suppose we have obtained, using Bayes’ rule and evidence E a probability for our diagnosis:  now a new evidence F is observed; we can apply Bayes’ rule with E as the constant conditioning context:  in general, it may be difficult to find, however, if the pieces of evidence are conditionally independent, i.e.,  then we can simplify:  in fact, Pr(E  F) can be eliminated by normalization, provided that we also assess Pr(E |  H) and Pr(F |  H)

37 Foundations of Artificial Intelligence 37 Bayes’ Rule - Example (Continued)  In the previous example, we had: Pr(fever | flu) = 0.8, Pr(flu) = 0.0001, Pr(fever) = 0.02 allowing us to conclude that Pr(flu | fever) = 0.004 using Bayes’ rule.  Now, suppose we observe the new evidence that the patient is also exhibiting another symptom, soar-throat. We know that flu causes soar-throat 50% of the time (i.e., Pr(soar | flu) = 0.5) and 1/200 = 0.005 people have soar throats. We can now update our diagnosis that she has flu.  Assuming that symptoms are conditionally independent, the Bayesian updating gives:  in general, it is difficult to find Pr(soar | fever), instead we can use normalization:  assuming that Pr(soar |  flu) is approx. the same as Pr(soar) = 0.005, we have:  Pr(flu | fever /\ soar) is proportional to: 0.004 x 0.5 = 0.002  Pr(  flu | fever /\ soar) is proportional to: 0.996 x 0.005 = 0.00498  so, normalization gives us the following updated probabilities:

38 Foundations of Artificial Intelligence 38 Issues  If a state is described by n propositions, then a belief state contains 2 n states  The joint distribution can be used to update probabilities when new evidence becomes available, but  The joint distribution contains 2 n entries that may have to be changed  Useful independence assumptions are not made explicit   Modeling difficulty: many numbers must be entered in the first place   Computational issue: memory size and time  Solution: Bayesian Networks  Facilitate the description of a collection of beliefs by making explicit causality relations and conditional independence among beliefs  Provide a more efficient way (than by using joint distribution tables) to update belief strengths when new evidence is observed

39 Foundations of Artificial Intelligence 39  Toothache and PCatch are independent given Cavity (or  Cavity), but this relation is hidden in the numbers!  Bayesian networks explicitly represent independence among propositions to reduce the number of probabilities defining a belief state  Also called: Belief Networks, Influence Diagrams PCatch  PCatch PCatch  PCatch Cavity0.1080.0120.0720.008  Cavity0.0160.0640.1440.576 Toothache  Toothache

40 Foundations of Artificial Intelligence 40 More on Conditional independence  P(Toothache, Cavity, PCatch) has 2 3 = 8 entries  If I have a cavity, the probability that the probe catches in it doesn't depend on whether I have a toothache: (1) P(PCatch | Toothache, Cavity) = P(PCatch | Cavity)  The same independence holds if I haven't got a cavity: (2) P(PCatch | Toothache,  Cavity) = P(PCatch |  Cavity)  Catch is conditionally independent of Toothache given Cavity: P(PCatch | Toothache,Cavity) = P(Catch | Cavity)  Equivalent statements: P(Toothache | PCatch, Cavity) = P(Toothache | Cavity) P(Toothache, PCatch | Cavity) = P(Toothache | Cavity) P(PCatch | Cavity)

41 Foundations of Artificial Intelligence 41 More on Conditional independence  Write out full joint distribution using chain rule: P(Toothache, Catch, Cavity) = P(Toothache | PCatch, Cavity) P(PCatch, Cavity) = P(Toothache | PCatch, Cavity) P(PCatch | Cavity) P(Cavity) = P(Toothache | Cavity) P(PCatch | Cavity) P(Cavity) I.e., 2 + 2 + 1 = 5 independent numbers  In most cases, the use of conditional independence reduces the size of the representation of the joint distribution from exponential in n to linear in n.  Conditional independence is our most basic and robust form of knowledge about uncertain environments.

42 Foundations of Artificial Intelligence 42 Bayesian Network  Notice that Cavity is the “cause” of both Toothache and PCatch, and represents the causality links explicitly  Give the prior probability distribution of Cavity  Give the conditional probability tables of Toothache and PCatch Cavity Toothache P(Cavity) 0.2 P(Toothache|c) Cavity  Cavity 0.6 0.1 PCatch P(PCatch|c) Cavity  Cavity 0.9 0.02 5 probabilities, instead of 8

43 Foundations of Artificial Intelligence 43 A More Complex BN  I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?  Variables:  Burglary, Earthquake, Alarm, JohnCalls, MaryCalls  Network topology reflects "causal" knowledge:  A burglar can set the alarm off  An earthquake can set the alarm off  The alarm can cause Mary to call  The alarm can cause John to call

44 Foundations of Artificial Intelligence 44 A More Complex BN BurglaryEarthquake Alarm MaryCallsJohnCalls causes effects Directed acyclic graph Intuitive meaning of arc from x to y: “x has direct influence on y”

45 Foundations of Artificial Intelligence 45 BEP(A|…) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 Size of the CPT for a node with k parents: 2 k A More Complex BN 10 probabilities, instead of 32

46 Foundations of Artificial Intelligence 46 What does the BN encode? Each of the beliefs JohnCalls and MaryCalls is independent of Burglary and Earthquake given Alarm or  Alarm BurglaryEarthquake Alarm MaryCallsJohnCalls For example, John does not observe any burglaries directly

47 Foundations of Artificial Intelligence 47 The beliefs JohnCalls and MaryCalls are independent given Alarm or  Alarm For instance, the reasons why John and Mary may not call if there is an alarm are unrelated BurglaryEarthquake Alarm MaryCallsJohnCalls What does the BN encode? A node is independent of its non-descendants given its parents

48 Foundations of Artificial Intelligence 48 Locally Structured World  A world is locally structured (or sparse) if each of its components interacts directly with relatively few other components  In a sparse world, the CPTs are small and the BN contains much fewer probabilities than the full joint distribution  If the # of entries in each CPT is bounded, i.e., O(1), then the # of probabilities in a BN is linear in n – the # of propositions – instead of 2 n for the joint distribution  But, can we compute the full joint distribution of the propositions from it?

49 Foundations of Artificial Intelligence 49 Calculation of Joint Probability BEP(A|…) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J  M  A   B   E) = ??

50 Foundations of Artificial Intelligence 50  P(J  M  A  B  E) = P(J  M|A,  B,  E)  P(A  B  E) = P(J|A,  B,  E)  P(M|A,  B,  E)  P(A  B  E) (J and M are independent given A)  P(J|A,  B,  E) = P(J|A) (J and  B  E are independent given A)  P(M|A,  B,  E) = P(M|A)  P(A  B  E) = P(A|  B,  E)  P(  B|  E)  P(  E) = P(A|  B,  E)  P(  B)  P(  E) (  B and  E are independent)  P(J  M  A  B  E) = P(J|A)P(M|A)P(A|  B,  E)P(  B)P(  E) BurglaryEarthquake Alarm MaryCallsJohnCalls

51 Foundations of Artificial Intelligence 51 Calculation of Joint Probability BEP(A| … ) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J  M  A   B   E) = P(J|A)P(M|A)P(A|  B,  E)P(  B)P(  E) = 0.9 x 0.7 x 0.001 x 0.999 x 0.998 = 0.00062

52 Foundations of Artificial Intelligence 52 Calculation of Joint Probability BEP(A|…) TTFFTTFF TFTFTFTF 0.95 0.94 0.29 0.001 BurglaryEarthquake Alarm MaryCallsJohnCalls P(B) 0.001 P(E) 0.002 AP(J|…) TFTF 0.90 0.05 AP(M|…) TFTF 0.70 0.01 P(J  M  A   B   E) = P(J|A)P(M|A)P(A|  B,  E)P(  B)P(  E) = 0.9 x 0.7 x 0.001 x 0.999 x 0.998 = 0.00062 P(x 1  x 2  …  x n ) =  i=1,…,n P(x i |parents(X i ))  full joint distribution table

53 Foundations of Artificial Intelligence 53  The BN gives P(t|c)  What about P(c|t)?  P(Cavity|t) = P(Cavity  t) / P(t) = P(t|Cavity) P(Cavity) / P(t) [Bayes’ rule]  P(c|t) = a P(t|c) P(c)  Querying a BN is just applying the trivial Bayes’ rule on a larger scale Querying the BN Cavity Toothache P(C) 0.1 CP(T|c) TFTF 0.4 0.01111

54 Foundations of Artificial Intelligence 54 Constructing Bayesian networks  1. Choose an ordering of variables X 1, …,X n  2. For i = 1 to n  add X i to the network  select parents from X 1, …, X i-1 such that P (X i | Parents(X i )) = P (X i | X 1,... X i-1 ) This choice of parents guarantees: P (X 1, …,X n ) = π i =1 P (X i | X 1, …, X i-1 ) (chain rule) = π i =1 P (X i | Parents(X i )) n n Note that the network structure depends on the order in which the variables are encountered in the construction process.

55 Foundations of Artificial Intelligence 55  Suppose we choose the ordering M, J, A, B, E P(M) (does not depend on anything else, so no parent) P(J | M) = P(J)? No P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No P(B | A, J, M) = P(B | A)? Yes P(B | A, J, M) = P(B)? No P(E | B, A,J, M) = P(E | A)? No P(E | B, A, J, M) = P(E | A, B)? Yes Construction Example

56 Foundations of Artificial Intelligence 56 Some Applications of BN  Medical diagnosis, e.g., lymph-node diseases  Troubleshooting of hardware/software systems  Fraud/uncollectible debt detection  Data mining  Analysis of genetic sequences  Data interpretation, computer vision, image understanding

57 Foundations of Artificial Intelligence 57 Rational Agents and Uncertainty  In presence of uncertainty, rational agents must make choices as they decide on actions that are most likely to achieve a goal  agent needs to have preferences among different possible outcomes  the notion of utility is used to reason about preferences  agents must assign utilities to various states, based on domain-specific information  Decision Theory  preferences (utilities) are combined with probabilities to provide a general theory of rational decision making  decision theory = probability theory + utility theory  principle of Maximum Expected Utility: “an agent is rational if and only if it chooses actions that yields the highest expected utility, averaged over all possible outcomes of the action”  so, when evaluating an action, the utility of a particular outcome is weighted by the probability that the outcome occurs  Detailed design of a decision theoretic agent is given in RN Chapters 16 & 17. We only briefly discuss the general framework.

58 Foundations of Artificial Intelligence 58 General Framework  An agent operates in a certain state space:  There is no goal state; instead, states provide rewards (positive, negative, or null)  A state’s reward quantifies in a single unit system what the agent gets when it visits this state (a bag of gold, a sunny afternoon on the beach, a speeding ticket, etc...)  Each action has several possible outcomes, each with some probability; sensing may also be imperfect  The agent’s goal is to plan a strategy (here, it is called a policy) to maximize the expected amount of rewards collected  As usual, there are many variants...

59 Foundations of Artificial Intelligence 59 Action Model Action a: s  S  a(s) = {s 1 (p 1 ), s 2 (p 2 ),..., s n (p n )} probabilistic distribution of possible successor states [ ] Markov assumption: The action model a(s) does not depend on what happened prior to reaching s [Otherwise, the prior history should be encoded in s]

60 Foundations of Artificial Intelligence 60 s0s0 s3s3 s2s2 s1s1 A1A1 0.20.70.1  S 0 describes many actual states of the real world. A 1 reaches s 1 in some states, s 2 in others, and s 3 in the remaining ones  If the agent could return to S 0 many times in independent ways and if at each time it executed A 1, then it would reach s 1 20% of the times, s 2, 70% of the times, and s 3 10% of the times A Simple Example

61 Foundations of Artificial Intelligence 61 s0s0 s3s3 s2s2 s1s1 A1A1 0.20.70.1 1005070  rewards associated with states s 1, s 2, and s 3  Assume that the agent receives rewards in some states (rewards can be positive or negative)  If the agent could execute A 1 in S 0 many times, the average (expected) reward that it would get is: U 1 (S 0 ) = 100x0.2 + 50x0.7 + 70x0.1 = 20 + 35 + 7 = 62 Introducing rewards...

62 Foundations of Artificial Intelligence 62 s0s0 s3s3 s2s2 s1s1 A1A1 0.20.70.1/0.2 0.8 1005070 80... and a second action... A2A2 s4s4  U 1 (S 0 ) = 62  U 2 (S 0 ) = 78  If the agent chooses to execute A 2, it will maximize the average collected rewards

63 Foundations of Artificial Intelligence 63 s0s0 s3s3 s2s2 s1s1 A1A1 0.20.70.1/0.2 0.8 1005070 80... and action costs A2A2 s4s4  We may associate costs with actions, which are subtracted from the rewards  U 1 (S 0 ) = 62 – 5 = 57  U 2 (S 0 ) = 78 – 25 = 53  When in s 0, the agent will now gain more on the average by executing A 1 than by executing A 2 (5)(25)

64 Foundations of Artificial Intelligence 64 Generalization  Inputs: Initial state s 0 Action model Reward R(s) collected in each state s  A state is terminal if it has no successor  Starting at s 0, the agent keeps executing actions until it reaches a terminal state  Its goal is to maximize the expected sum of rewards collected (additive rewards)  Assume that the same state can’t be reached twice (no cycles)

65 Foundations of Artificial Intelligence 65 Utility of a State The utility of a state s measures its desirability:  If s is terminal: U(s) = R(s)  If s is non-terminal, U(s) = R(s) + max a  Appl(s)  s’  Succ(s,a) P(s’|a.s)U(s’) [the reward of s augmented by the expected sum of rewards collected in future states]

66 Foundations of Artificial Intelligence 66 U(s) = R(s) + max a  Appl(s)  s’  Succ(s,a) P(s’|a.s)U(s’) Appl(s) is the set of all actions applicable to state s Succ(s,a) is the set of all possible states after applying a to s P(s’|a.s) is the probability of being in s’ after executing a in s

67 Foundations of Artificial Intelligence 67 U(s) = R(s) + max a  Appl(s) [-cost(a) +  s’  Succ(s,a) P(s’|a.s)U(s’)] Utility with Action Costs

68 Foundations of Artificial Intelligence 68 Optimal Policy  A policy is a function that maps each state s into the action to execute if s is reached  The optimal policy  * is the policy that always lead to maximizing the expected sum of rewards collected in future states (Maximum Expected Utility principle)  *(s) = arg max a  Appl(s) [-cost(a)+  s’  Succ(s,a) P(s’|a.s)U(s’)]

69 Foundations of Artificial Intelligence 69 Basic Decision Theoretic Agent function DT-Agent (percept) returns an action static: a set of probabilistic beliefs about the state of the world 1.calculate updated probabilities for current state based on available evidence, including current percept and previous actions 2. calculate outcome probabilities for actions, given action descriptions and probabilities of current state 3.select action with highest expected utility, given probabilities of outcomes and a utility function 4.return action function DT-Agent (percept) returns an action static: a set of probabilistic beliefs about the state of the world 1.calculate updated probabilities for current state based on available evidence, including current percept and previous actions 2. calculate outcome probabilities for actions, given action descriptions and probabilities of current state 3.select action with highest expected utility, given probabilities of outcomes and a utility function 4.return action


Download ppt "Reasoning with Uncertainty; Probabilistic Reasoning Foundations of Artificial Intelligence."

Similar presentations


Ads by Google