Artificial Intelligence Universitatea Politehnica Bucuresti 2007-2008 Adina Magda Florea

Slides:



Advertisements
Similar presentations
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Advertisements

Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
Probability: Review The state of the world is described using random variables Probabilities are defined over events –Sets of world states characterized.
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
Reasoning under Uncertainty Department of Computer Science & Engineering Indian Institute of Technology Kharagpur.
1 Knowledge Engineering for Bayesian Networks. 2 Probability theory for representing uncertainty l Assigns a numerical degree of belief between 0 and.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
Uncertain knowledge and reasoning
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Reasoning under Uncertainty: Conditional Prob., Bayes and Independence Computer Science cpsc322, Lecture 25 (Textbook Chpt ) March, 17, 2010.
Review: Bayesian learning and inference
CPSC 422 Review Of Probability Theory.
Probability.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Uncertainty Management for Intelligent Systems : for SEP502 June 2006 김 진형 KAIST
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
1 Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14,
Uncertainty Chapter 13.
Uncertainty Chapter 13.
Probabilistic Reasoning
CSCI 121 Special Topics: Bayesian Network Lecture #1: Reasoning Under Uncertainty.
Artificial Intelligence CS 165A Tuesday, November 27, 2007  Probabilistic Reasoning (Ch 14)
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Probabilistic Belief States and Bayesian Networks (Where we exploit the sparseness of direct interactions among components of a world) R&N: Chap. 14, Sect.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Aprendizagem Computacional Gladys Castillo, UA Bayesian Networks Classifiers Gladys Castillo University of Aveiro.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
1 Monte Carlo Artificial Intelligence: Bayesian Networks.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CSE PR 1 Reasoning - Rule-based and Probabilistic Representing relations with predicate logic Limitations of predicate logic Representing relations.
Uncertainty Uncertain Knowledge Probability Review Bayes’ Theorem Summary.
Chapter 13 February 19, Acting Under Uncertainty Rational Decision – Depends on the relative importance of the goals and the likelihood of.
Uncertainty. Assumptions Inherent in Deductive Logic-based Systems All the assertions we wish to make and use are universally true. Observations of the.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
1 Probability FOL fails for a domain due to: –Laziness: too much to list the complete set of rules, too hard to use the enormous rules that result –Theoretical.
Conditional Probability, Bayes’ Theorem, and Belief Networks CISC 2315 Discrete Structures Spring2010 Professor William G. Tanner, Jr.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
Belief Networks Kostas Kontogiannis E&CE 457. Belief Networks A belief network is a graph in which the following holds: –A set of random variables makes.
Anifuddin Azis UNCERTAINTY. 2 Introduction The world is not a well-defined place. There is uncertainty in the facts we know: What’s the temperature? Imprecise.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
CS 2750: Machine Learning Directed Graphical Models
Bayesian Networks Chapter 14 Section 1, 2, 4.
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Uncertainty Chapter 13.
Conditional Probability, Bayes’ Theorem, and Belief Networks
Bayesian Networks Probability In AI.
Uncertainty.
Bayesian Statistics and Belief Networks
Class #21 – Monday, November 10
Belief Networks CS121 – Winter 2003 Belief Networks.
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Uncertainty Chapter 13.
Bayesian Networks CSE 573.
Uncertainty Chapter 13.
Presentation transcript:

Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea

Lecture no. 7 Uncertain knowledge and reasoning Probability theory Bayesian networks Certainty factors 2

1. Probability theory 1.1 Uncertain knowledge  p symptom(p, Toothache)  disease(p,cavity)  p sympt(p,Toothache)  disease(p,cavity)  disease(p,gum_disease)  … PL - laziness - theoretical ignorance - practical ignorance Probability theory  degree of belief or plausibility a of statements – a numerical measure in [0,1] Degree of truth – fuzzy logic  degree of belief 3

1.2 Definitions Unconditional or prior probability of A – the degree of belief in A in the absence of any other information – P(A) A – random variable Probability distribution – P(A), P(A,B) Example P(Weather = Sunny) = 0.1 P(Weather = Rain) = 0.7 P(Weather = Snow) = 0.2 Weather – random variable P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution Conditional probability – posterior – once the agent has obtained some evidence B for A - P(A|B) P(Cavity | Toothache) = 0.8 4

Definitions - cont Axioms of probability The measure of the occurrence of an event (random variable) A – a function P:S  R satisfying the axioms: 0  P(A)  1 P(S) = 1 ( or P(true) = 1 and P(false) = 0) P(A  B) = P(A) + P(B) - P(A  B) P(A  ~A) = P(A)+P(~A) –P(false) = P(true) P(~A) = 1 – P(A) 5

Definitions - cont A and B mutually exclusive  P(A  B) = P(A) + P(B) P(e 1  e 2  e 3  … e n ) = P(e 1 ) + P(e 2 ) + P(e 3 ) + … + P(e n ) The probability of a proposition a is equal to the sum of the probabilities of the atomic events in which a holds e(a) – the set of atomic events in which a holds P(a) =  P(e i ) e i  e(a) 6

1.3 Product rule Conditional probabilities can be defined in terms of unconditional probabilities The condition probability of the occurrence of A if event B occurs P(A|B) = P(A  B) / P(B) This can be written also as: P(A  B) = P(A|B) * P(B) For probability distributions P(A=a1  B=b1) = P(A=a1|B=b1) * P(B=b1) P(A=a1  B=b2) = P(A=a1|B=b2) * P(B=b2) …. P(X,Y) = P(X|Y)*P(Y) 7

1.4 Bayes’ rule and its use P(A  B) = P(A|B) *P(B) P(A  B) = P(B|A) *P(A) Bays’ rule (theorem) P(B|A) = P(A | B) * P(B) / P(A) 8

Bayes Theorem h i – hypotheses (i=1,k); e 1,…,e n - evidence P(h i ) P(h i | e 1,…,e n ) P(e 1,…,e n | h i ) 9

Bayes’ Theorem - cont If e 1,…,e n are independent hypotheses then PROSPECTOR 10

1.5 Inferences Probability distribution P (Cavity, Tooth) Tooth  Tooth Cavity  Cavity P(Cavity) = = 0.1 P(Cavity  Tooth) = = 0.11 P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = 0.04 /

Inferences Probability distributions P (Cavity, Tooth, Catch) P(Cavity) = = 0.2 P(Cavity  Tooth) = = 0.28 P(Cavity | Tooth) = P(Cavity  Tooth) / P(Tooth) = [P(Cavity  Tooth  Catch) + P(Cavity  Tooth  ~ Catch)] * / P(Tooth) 12 Tooth~ Tooth Catch~ CatchCatch~ Catch Cavity ~ Cavity

2 Bayesian networks Represent dependencies among random variables Give a short specification of conditional probability distribution Many random variables are conditionally independent Simplifies computations Graphical representation DAG – causal relationships among random variables Allows inferences based on the network structure 13

2.1 Definition of Bayesian networks A BN is a DAG in which each node is annotated with quantitative probability information, namely: Nodes represent random variables (discrete or continuous) Directed links X  Y: X has a direct influence on Y, X is said to be a parent of Y each node X has an associated conditional probability table, P(X i | Parents(X i )) that quantify the effects of the parents on the node Example: Weather, Cavity, Toothache, Catch Weather, Cavity  Toothache, Cavity  Catch 14

Bayesian network - example 15 Earthquake Alarm JohnCallsMaryCalls Burglary P(B) P(E) B E P(A) T T 0.95 T F 0.94 F T 0.29 F F A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 B E P(A | B, E) TF T T T F F T F F Conditional probability table

2.2 Bayesian network semantics A) Represent a probability distribution B) Specify conditional independence – build the network A) each value of the probability distribution can be computed as: P(X 1 =x 1  … X n =x n ) = P(x 1,…, x n ) =  i=1,n P(x i | Parents(x i )) where Parents(xi) represent the specific values of Parents(Xi) 16

2.3 Building the network P(X 1 =x 1  … X n =x n ) = P(x 1,…, x n ) = P(x n | x n-1,…, x 1 ) * P(x n-1,…, x 1 ) = … = P(x n | x n-1,…, x 1 ) * P(x n-1 | x n-2,…, x 1 )* … P(x 2 |x 1 ) * P(x 1 ) =  i=1,n P(x i | x i-1,…, x 1 ) We can see that P(X i | X i-1,…, X 1 ) = P(x i | Parents(X i )) if Parents(X i )  { X i-1,…, X 1 } The condition may be satisfied by labeling the nodes in an order consistent with a DAG Intuitively, the parents of a node X i must be all the nodes X i-1,…, X 1 which have a direct influence on X i. 17

Building the network - cont Pick a set of random variables that describe the problem Pick an ordering of those variables while there are still variables repeat (a) choose a variable X i and add a node associated to X i (b) assign Parents(X i )  a minimal set of nodes that already exists in the network such that the conditional independence property is satisfied (c) define the conditional probability table for X i Because each node is linked only to previous nodes  DAG P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) = P(MaryCalls | Alarm) 18

Compactness of node ordering Far more compact than a probability distribution Example of locally structured system (or sparse): each component interacts directly only with a limited number of other components Associated usually with a linear growth in complexity rather than with an exponential one The order of adding the nodes is important The correct order in which to add nodes is to add the “root causes” first, then the variables they influence, and so on, until we reach the leaves 19

Different order of nodes 20 Burglary Earthquake Alarm JohnCalls MaryCalls Order: MaryCalls JohnCalls Alarm Burglary P(Burglary | Alarm, JohnCalls, MaryCalls) = P(Burglary|Alarm) Earthquake Network structure depends on order of introduction

2.4 Probabilistic inferences 21 P(A  V  B) = P(A) * P(V|A) * P(B|V) VABBVAAVB P(A  V  B) = P(V) * P(A|V) * P(B|V) P(A  V  B) = P(A) * P(B) * P(V|A,B)

Probabilistic inferences 22 Earthquake Alarm JohnCallsMaryCalls Burglary P(B) P(E) B E P(A) T T 0.95 T F 0.94 F T 0.29 F F A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 P(J  M  A  B  E ) = P(J|A)* P(M|A)*P(A|  B  E )*P(  B)  P(  E)= 0.9 * 0.7 * * * =

Probabilistic inferences 23 Earthquake Alarm JohnCallsMaryCalls Burglary P(B) P(E) B E P(A) T T 0.95 T F 0.94 F T 0.29 F F A P(J) T 0.9 F 0.05 A P(M) T 0.7 F 0.01 P(A|B) = P(A|B,E) *P(E|B) + P(A| B,  E)*P(  E|B) = P(A|B,E) *P(E) + P(A| B,  E)*P(  E) = 0.95 * * =

2.5 Different types of inferences 24 Alarm Intercausal inferences (between cause and common effects) P(Burglary | Alarm  Earthquake) Mixed inferences P(Alarm | JohnCalls   Earthquake)  diag + causal P(Burglary | JohnCalls   Earthquake)  diag + intercausal Diagnosis inferences (effect  cause) P(Burglary | JohnCalls) Causal inferences (cause  effect) P(JohnCalls |Burglary), P(MaryCalls | Burgalry) Earthquake JohnCallsMaryCalls Burglary

3. Certainty factors The MYCIN model Certainty factors / Confidence coefficients (CF) Heuristic model of uncertain knowledge In MYCIN – two probabilistic functions to model the degree of belief and the degree of disbelief in a hypothesis function to measure the degree of belief - MB function to measure the degree of disbelief - MD MB[h,e] – how much the belief in h increases based on evidence e MD[h,e] - how much the disbelief in h increases based on evidence e 25

3.1 Belief functions Certainty factor 26

Belief functions - features Value range Hypotheses are sustained by independent evidences If h is sure, i.e. P(h|e) = 1, then If the negation of h is sure, i.e., P(h|e) = 0 then 27

Example in MYCIN if (1) the type of the organism is gram-positive, and (2) the morphology of the organism is coccus, and (3) the growth of the organism is chain then there is a strong evidence (0.7) that the identity of the organism is streptococcus Example of facts in MYCIN : (identity organism-1 pseudomonas 0.8) (identity organism-2 e.coli 0.15) (morphology organism-2 coccus 1.0) 28

3.2 Combining belief functions 29 (1) Incremental gathering of evidence The asme attribute value, h, is obtained by two separate paths of inference, with two separate CFs : CF[h,s1] si CF[h,s2] The two different paths, corresponding to hypotheses s1 and s2 may be different braches of the search tree. CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2] (identity organism-1 pseudomonas 0.8) (identity organism-1 pseudomonas 0.7)

Combining belief functions 31 (2) Conjunction of hypothesis Applied for computing the CF associated to the premisis of a rule which ahs several conditions if A = a1 and B = b1 then … WM: (A a1 s1 cf1)(B b1 s2 cf2) CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])

Combining belief functions 32 (3) Combining beliefs An uncertain value is deduced based on a rule which has as input conditions based on uncertain values (may be obtained by applying other rules for example). Allows the computation of the CF of the fact deduced by the rule based on the rule’s CF and the CF of the hypotheses CF[s,e] – belief in a hypothesis s based on previous evidence e CF[h,s] - CF in h if s is sure CF’[h,s] = CF[h,s] * CF [s,e]

Combining belief functions 33 (3) Combining beliefs – cont if A = a1 and B = b1 then C = c1 0.7 ML: (A a1 0.9)(B b1 0.6) CF(premisis) = min(0.9, 0.6) = 0.6 CF (conclusion) = CF(premisais) * CF(rule) = 0.6 * 0.7 ML: (C c1 0.42)

3.3 Limits of CF 34 CF of MYCIN assumes that that the hypothesis are sustained by independent evidence An example shows what happens if this condition is violated A:The sprinkle functioned last night U:The grass is wet in the morning P:Last night it rained

Limits of CF - cont 35 R1:if the sprinkle functioned last night then there is a strong evidence (0.9) that the grass is wet in the morning R2:if the grass is wet in the morning then there is a strong evidence (0.8) that it rained last night CF[U,A] = 0.9 therefore the evidence sprinkle sustains the hypothesis wet grass with CF = 0.9 CF[P,U] = 0.8 therefore the evidence wet grass sustains the hypothesis rain with CF = 0.8 CF[P,A] = 0.8 * 0.9 = 0.72 therefore the evidence sprinkle sustains the hypothesis rain with CF = 0.72 Solutions