Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intro to AI Uncertainty Ruth Bergman Fall 2002. Why Not Use Logic? Suppose I want to write down rules about medical diagnosis: Diagnostic rules:  p Symptom(p,Toothache)

Similar presentations


Presentation on theme: "Intro to AI Uncertainty Ruth Bergman Fall 2002. Why Not Use Logic? Suppose I want to write down rules about medical diagnosis: Diagnostic rules:  p Symptom(p,Toothache)"— Presentation transcript:

1 Intro to AI Uncertainty Ruth Bergman Fall 2002

2 Why Not Use Logic? Suppose I want to write down rules about medical diagnosis: Diagnostic rules:  p Symptom(p,Toothache)  Disease(p, Cavity) Causal rules:  p Disease(p, Cavity)  Symptom(p,Toothache) Clearly, this isn’t right: Diagnostic case: many other possible causes Causal rules: not all cavities cause pain Logic fails due to –Laziness: it is too much work to write correct rules –Theoretical Ignorance: we do not know the correct rules –Practical Ignorance: not all the information is available

3 Uncertainty The problem with pure FOL is that it deals with black and white The world isn’t black and white because of uncertainty: 1.Uncertainty due to imprecision or noise 2.Uncertainty because we don’t know everything about the domain 3.Uncertainty because in practice we often cannot acquire all the information we’d like. –As a result, we’d like to assign a degree of belief (or plausibility or possibility) to any statement we make –note this is different than a degree of truth!

4 Ways of Handling Uncertainty MYCIN: operationalize uncertainty with the rules: a  b with certainty 0.7 we know a with certainty 1 –ergo, we know b with 0.7 –but, we if we also know a  c with certainty 0.6 b v c  d with certainty 1 –do we know d with certainty.7,.6,.88, 1,....? suppose a  ~e and ~e  ~d.... –In a rule-based system, such non-local dependencies are hard to catch

5 Probability Problems such as this have led people to invent lots of calculi for uncertainty; probability still dominates Basic idea: –I have some Degree of Belief (DoB) about some proposition A --- a prior probability –I receive evidence about A; the evidence is related to A by a conditional probability –From these two quantities, I can compute an updated DoB about A --- a posterior probability

6 Probability Review Basic probability is on propositions or propositional statements: –P(A) (A is a proposition) P(Accident), P(phonecall), P(Cold) –P(X = v) (X is a random variable; v a value) P(card = JackofClubs), P(weather=sunny),.... –P(A v B), P(A ^ B) = P(A, B), P(~A)... –Referred to as the prior or unconditional probability Axioms: –0 <= P(A) <= 1 –P(sure proposition) = 1 –P(A v B) = P(A) + P(B) if A and B are mutually exclusive Law of total probability P(A) = Σ i P(A, B i ) If B i, I = 1,..,n is a set of exhaustive and mutually exclusive propositions

7 Probability Review The conditional probability of A given B P(A | B) = P(A, B) / P(B), for P(B) > 0 The Bayesian point of view –the product rule P(A, B) = P(A | B) * P(B) The probability of an event A can be computed by conditioning it on a set of exhaustive and mutually exclusive events P(A) = Σ i P(A | B i ) * P(B i ) –More general, P(A | E) = Σ i P(A | B i, E) * P(B i | E) Conditional independence P(A | B) = P(A) –A is conditionally independent of B

8 Probability Review The joint distribution of A and B –P(A,B) = x ( equivalent to P(A ^ B) = x) A=1A=2A=3 B = T.1.2.4 B = F.2.1.3.6.3.2.51 P(A=1,B) =.1 P(A=1) =.1 +.2 =.3 P(A =1 | B) =.1/.4 =.25

9 Bayes Theorem P(A,B) = P(A | B) P(B) = P(B | A) P(A) P(A|B) = P(B | A) P(A) / P(B) Example: what is the probability of meningitis when a patient has a stiff neck? P(S|M) = 0.5 P(M) = 1/50000 P(S) = 1/20 P(M|S) = P(S|M)P(M)/P(S) = 0.5 * 1/50000 / 1/20 = 0.0002 More general P(A | B, E) = P(B | A, E) P(A | E)/ P(B | E)

10 Alarm System Example A burglary alarm system is fairly reliable at detecting burglary It may also respond to minor earthquakes Neighbors John and Mary will call when they hear the alarm John always calls when he hears the alarm He sometimes confuses the telephone with the alarm and calls Mary sometimes misses the alarm Given the evidence of who has or has not called, we would like to estimate the probability of a burglary.

11 Alarm System Example P(Alarm|Burglary) A burglary alarm system is fairly reliable at detecting burglary P(Alarm|Earthquake) It may also respond to minor earthquakes P(JohnCalls|Alarm), P(MaryCalls|Alarm) Neighbors John and Mary will call when they hear the alarm John always calls when he hears the alarm P(JohnCalls|~Alarm) He sometimes confuses the telephone with the alarm and calls Mary sometimes misses the alarm Given the evidence of who has or has not called, we would like to estimate the probability of a burglary. P(Burglary|JohnCalls,MaryCalls)

12 Influence Diagrams Another way to present this information is an influence diagram burglary earthquake Mary calls John calls alarm

13 Influence Diagrams 1.A set of random variables. 2.A set of directed arcs An arc from X to Y means that X has influence on Y. 3.Each node has an associated conditional probability table. 4.The graph has no directed cycle. burglary earthquake Mary calls John calls alarm

14 Conditional Probability Tables burglary earthquake Mary calls John calls alarm BEP(Alarm|B, E) T F TT0.950.05 TF0.940.06 FT0.290.71 FF0.0010.999 Each row contains the conditional probability for a possible combination of values of the parent nodes Each row must sum to 1

15 Belief Network for the Alarm 0.001 P(B) 0.002 P(E) 0.70 0.01 TFTF P(M|A)A 0.90 0.05 TFTF P(J|A)A 0.95 0.94 0.29 0.001 T T F F T F P(A|B,E)B E burglary earthquake Mary calls John calls alarm

16 The Semanics of Belief Networks The probability that the alarm sounded but neither a burglary nor an earthquake has occurred and both John and Mary call –P(J ^ M ^ A ^ ~B ^ ~E) = P(J | A) P(M | A) P(A | ~B ^ ~E) P(~B) P(~E) = 0.9 * 0.7 * 0.001 * 0.999* 0.998 = 0.00062 More generally, we can write this as –P(x 1,... x n ) = π i P(x i | Parents(X i )) AKA Bayesian Networks

17 Constructing Belief Networks 1.Choose the set of variables X i that describe the domain 2.Choose an ordering for the variables 1.Ideally, work backward from observables to root causes 3.While there are variables left: 1.Pick a variable X i and add it to the network 2.Set Parents{X i } to the minimal set of nodes such that conditional independence holds 3.Define the conditional probability table for X i Once you’re done, its likely you’ll realize you need to fiddle a little bit!

18 Node Ordering The correct order to add nodes is –Add the “root causes” first –Then the variables they influence –And so on… Alarm example: consider the ordering –MaryCalls, JohnCalls, Alarm, Burglary, Earthquake –MaryCalls, JohnCalls, Earthquake, Burglary, Alarm burglary earthquake Mary calls John calls alarm

19 Probabilistic Inference Diagnostic inference (from effects to causes) –Given that JohnCalls, infer that P(B|J) = 0.016 Causal inference (from causes to effects) –Given Burglary, P(J|B) = 0.85 and P(M|B) = 0.66 Intercausal inference (between causes of a common effect) –Given Alarm, P(B|A) = 0.376 –If Earthquake is also true, P(B|A^E) = 0.003 Mixed inference (combining two or more of the above) –P(A|J ^ ~E) = 0.03 –P(B|J ^ ~E) = 0.017

20 Inference in Belief Networks A query P(X | e) can be answered using P(X | e) = kP(X,e) = k Σ y P(X, e, y) Example: P(Alarm=True | JohnCalls=True, Earthquake=False) Disadvantage: –The complexity of the algorithm for a network with n Boolean variables is O(2 n ) –Eliminate repeated calculations using dynamic programming Exponential time complexity in the worst case

21 Conditional Independence D-separation if every undirected path from a set of nodes X to a set of nodes Y is d-separated by E, then X and Y are conditionally independent given E a set of nodes E d-separates two sets of nodes X and Y if every undirected path from a node in X to a node in Y is blocked given E Z Z Z E XY

22 Conditional Independence An undirected path from X to Y is blocked given E if there is a node Z s.t. 1.Z is in E and there is one arrow leading in and one arrow leading out 2.Z is in E and Z has both arrows leading out 3.Neither Z nor any descendant of Z is in E and both path arrows lead into Z Z Z Z E XY

23 An Inference Algorithm for Belief Networks In order to develop an algorithm, we will assume our networks are singly connected –A network is singly connected if there is at most a single undirected path between nodes in the network note this means that any two nodes can be d-separated by removing a single node –These are also known as polytrees. We will then consider a generic node X with parents U 1...U m, and children Y 1... Y n. –parents of Y i are Z i,j –Evidence above X is E x + ; below is E x -

24 Singly Connected Network X U1U1 UmUm … Z 1j Y1Y1 Y1Y1 … Ex+Ex+ Ex-Ex-

25 Inference in Belief Networks P(X|E x ) = P(X | E x +, E x - ) = k P(E x - | X, E x + ) P(X | E x + ) k P(E x - | X) P(X | E x + ) –the last follows by noting that X d-separates its parents and children Now, we note that we can apply the product rule to the second term P(X | E x + ) = Σ u P(X | u, E x + ) P(u | E x + ) = Σ u P(X | u) π i P(u i | E U\X ) again, these last facts follow from conditional independence Note that we now have a recursive algorithm: the first term in the sum is just a table lookup; the second is what we started with on a smaller set of nodes. i

26 Inference in Belief Networks P(X|E) = k P(E x - | X) P(X | E x + ) The evaluation for the first expression is similar, but more involved, yielding P(E x - | X) = k 2 π i Σ y P(E x - | y i ) Σ z P(y i | X, z i ) π j P(z ij | E Zij\Yi ) P(E x - | y i ) is a recursive instance of P(E x - | X) P(y i | X, z i ) is a conditional probability table entry for Y i P(z ij | E Zij/Yi ) is a recursive instance of the P(X|E) calculation

27 The Algorithm Support-Except(X,V) return P(X| E x\v ) if EVIDENCE(X) then return point dist for X else calculate P(E - x\v | X) = evidence-except(X,V) U  parents(X) if U is empty then return k P(E - x\v | X) P(X) else for each U i in U calculate and store P(U i |E ui\X ) = support-except(U i,X) return k P(E x - | X) Σ u P(X | u) π i P(u i | E U\X )

28 The Algorithm Evidence-Except(X,V) return P(E - X\V | X ) Y  children[X] – V if Y is empty then return a uniform distribution else for each Y i in Y do calculate P(E - Yi |y i ) = Evidence-Except(Y i, null) Z i = PARENTS(Y i ) – X foreach Z ij in Z i calculate P(Z ij | E zij\Yi ) = Support-Except(Z ij,Y i ) return k 2 π i Σ y P(E x - | y i ) Σ z P(y i | X, z i ) π j P(z ij | E Zij/Yi )

29 The Call For a node X, call Support-Except(X,null)

30 PathFinder Diagnostic system for lymph node disease Pathfinder IV a Bayesian model –8 hrs devising vocabulary –35 hrs defining topology –40 hrs to make 14000 probability assessments –most recent version appears to outperform the experts who designed it!

31 Other Uncertainty Calculi Dempster-Shafer Theory –Ignorance: there are sets which have no probability –In this case, the best you can do, in some cases, is bound the probability –D-S theory is one way of doing this Fuzzy Logic –Suppose we introduce a fuzzy membership function (a degree of membership –Logical semantics are based on set membership –Thus, we get a logic with degrees of truth e.g. John is a big man  bigman(John) w. truth value 0.

32 Netica™ Application the world's most widely used Bayesian network development software, was designed to be simple, reliable, and high performing. For managing uncertainty in business, engineering, medicine, or ecology, it is the tool of choice for many of the world's leading companies and government agencies.


Download ppt "Intro to AI Uncertainty Ruth Bergman Fall 2002. Why Not Use Logic? Suppose I want to write down rules about medical diagnosis: Diagnostic rules:  p Symptom(p,Toothache)"

Similar presentations


Ads by Google