Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

BAYESIAN NETWORKS Ivan Bratko Faculty of Computer and Information Sc. University of Ljubljana.
Artificial Intelligence Universitatea Politehnica Bucuresti Adina Magda Florea
PROBABILITY. Uncertainty  Let action A t = leave for airport t minutes before flight from Logan Airport  Will A t get me there on time ? Problems :
BAYESIAN NETWORKS CHAPTER#4 Book: Modeling and Reasoning with Bayesian Networks Author : Adnan Darwiche Publisher: CambridgeUniversity Press 2009.
Identifying Conditional Independencies in Bayes Nets Lecture 4.
1 22c:145 Artificial Intelligence Bayesian Networks Reading: Ch 14. Russell & Norvig.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
3/19. Conditional Independence Assertions We write X || Y | Z to say that the set of variables X is conditionally independent of the set of variables.
CPSC 422 Review Of Probability Theory.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
KI2 - 2 Kunstmatige Intelligentie / RuG Probabilities Revisited AIMA, Chapter 13.
1 Bayesian Reasoning Chapter 13 CMSC 471 Adapted from slides by Tim Finin and Marie desJardins.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Representing Uncertainty CSE 473. © Daniel S. Weld 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one.
Ai in game programming it university of copenhagen Welcome to... the Crash Course Probability Theory Marco Loog.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Uncertainty Chapter 13.
Uncertainty Chapter 13.
1 CMSC 471 Fall 2002 Class #19 – Monday, November 4.
Probabilistic Reasoning
EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS
Bayesian networks Chapter 14. Outline Syntax Semantics.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
1 Chapter 13 Uncertainty. 2 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Chapter 14 Probabilistic Reasoning. 2 Outline Syntax of Bayesian networks Semantics of Bayesian networks Efficient representation of conditional distributions.
Baye’s Rule.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Uncertainty Chapter 13. Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes' Rule.
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Uncertainty Let action A t = leave for airport t minutes before flight Will A t get me there on time? Problems: 1.partial observability (road state, other.
CSE 473 Uncertainty. © UW CSE AI Faculty 2 Many Techniques Developed Fuzzy Logic Certainty Factors Non-monotonic logic Probability Only one has stood.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) Nov, 13, 2013.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Web-Mining Agents Data Mining Prof. Dr. Ralf Möller Universität zu Lübeck Institut für Informationssysteme Karsten Martiny (Übungen)
Reasoning Under Uncertainty: Belief Networks
Bayesian networks Chapter 14 Section 1 – 2.
Qian Liu CSE spring University of Pennsylvania
Uncertainty.
Still More Uncertainty
Representing Uncertainty
Structure and Semantics of BN
CS 188: Artificial Intelligence Fall 2008
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2007
Class #16 – Tuesday, October 26
Structure and Semantics of BN
Hankz Hankui Zhuo Bayesian Networks Hankz Hankui Zhuo
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
Uncertainty Chapter 13.
Bayesian Networks CSE 573.
Probabilistic Reasoning
Representing Uncertainty
Uncertainty Chapter 13.
Presentation transcript:

Bayesian Networks CSE 473

© Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence & conditional independence Bayes’ rule Bayesian networks Statistical learning Dynamic Bayesian networks (DBNs) Markov decision processes (MDPs)

© Daniel S. Weld 3 Axioms of Probability Theory All probabilities between 0 and 1 0 ≤ P(A) ≤ 1 P(true) = 1 P(false) = 0. The probability of disjunction is: A B A  B True

© Daniel S. Weld 4 Conditional Probability P(A | B) is the probability of A given B Assumes that B is the only info known. Defined by: A B ABAB True

© Daniel S. Weld 5 Inference by Enumeration P(toothache)= =.20 or 20%

© Daniel S. Weld 6 Inference by Enumeration

© Daniel S. Weld 7 Problems ?? Worst case time: O(n d ) Where d = max arity And n = number of random variables Space complexity also O(n d ) Size of joint distribution How get O(n d ) entries for table?? Value of cavity & catch irrelevant - When computing P(toothache)

© Daniel S. Weld 8 Independence A and B are independent iff: These two constraints are logically equivalent Therefore, if A and B are independent:

© Daniel S. Weld 9 Independence True B AA  B

© Daniel S. Weld 10 Independence Complete independence is powerful but rare What to do if it doesn’t hold?

© Daniel S. Weld 11 Conditional Independence True B AA  B A&B not independent, since P(A|B) < P(A)

© Daniel S. Weld 12 Conditional Independence True B AA  B C B  C ACAC But: A&B are made independent by  C P(A|  C) = P(A|B,  C)

© Daniel S. Weld 13 Conditional Independence Instead of 7 entries, only need 5

© Daniel S. Weld 14 Conditional Independence II P(catch | toothache, cavity) = P(catch | cavity) P(catch | toothache,  cavity) = P(catch |  cavity) Why only 5 entries in table?

© Daniel S. Weld 15 Power of Cond. Independence Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments.

© Daniel S. Weld 16 Bayes Rule Simple proof from def of conditional probability: QED: (Def. cond. prob.) (Mult by P(H) in line 1) (Substitute #3 in #2)

© Daniel S. Weld 17 Use to Compute Diagnostic Probability from Causal Probability E.g. let M be meningitis, S be stiff neck P(M) = , P(S) = 0.1, P(S|M)= 0.8 P(M|S) =

© Daniel S. Weld 18 Bayes’ Rule & Cond. Independence

© Daniel S. Weld 19 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update

© Daniel S. Weld 20 Independence (in the extreme) If X 1, X 2,... X n are mutually independent, then P(X 1, X 2,... X n ) = P(X 1 )P(X 2 )... P(X n ) Joint can be specified with n parameters cf. the usual 2 n -1 parameters required While extreme independence is unusual, Conditional independence is common BNs exploit this conditional independence

© Daniel S. Weld 21 An Example Bayes Net Earthquake BurglaryAlarm Nbr2CallsNbr1Calls Pr(B=t) Pr(B=f) Pr(A|E,B) e,b 0.9 (0.1) e,b 0.2 (0.8) e,b 0.85 (0.15) e,b 0.01 (0.99) Radio

© Daniel S. Weld 22 Earthquake Example (con’t) If I know if Alarm, no other evidence influences my degree of belief in Nbr1Calls P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E) By the chain rule we have P(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B) Full joint requires only 10 parameters (cf. 32) Earthquake Burglary Alarm Nbr2CallsNbr1Calls Radio

© Daniel S. Weld 23 BNs: Qualitative Structure Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket”

© Daniel S. Weld 24 Given Parents, X is Independent of Non-Descendants

© Daniel S. Weld 25 For Example EarthquakeBurglary Alarm Nbr2CallsNbr1Calls Radio

© Daniel S. Weld 26 Given Markov Blanket, X is Independent of All Other Nodes MB(X) = Par(X)  Childs(X)  Par(Childs(X))

© Daniel S. Weld 27 Conditional Probability Tables Earthquake BurglaryAlarm Nbr2CallsNbr1Calls Pr(B=t) Pr(B=f) Pr(A|E,B) e,b 0.9 (0.1) e,b 0.2 (0.8) e,b 0.85 (0.15) e,b 0.01 (0.99) Radio

© Daniel S. Weld 28 Conditional Probability Tables For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X 1, X 2,... X n is any topological sort of the network, then we are assured: P(X n,X n-1,...X 1 ) = P(X n | X n-1,...X 1 ) · P(X n-1 | X n-2,… X 1 ) … P(X 2 | X 1 ) · P(X 1 ) = P(X n | Par(X n )) · P(X n-1 | Par(X n-1 )) … P(X 1 )

© Daniel S. Weld 29 Inference in BNs The graphical independence representation yields efficient inference schemes We generally want to compute Pr(X), or Pr(X|E) where E is (conjunctive) evidence Computations organized by network topology One simple algorithm: variable elimination (VE)

© Daniel S. Weld 30 P(B | J=true, M=true) EarthquakeBurglary Alarm MaryJohn Radio P(b|j,m) =  P(b)  P(e)  P(a|b,e)P(j|a)P(m,a) e a

© Daniel S. Weld 31 Structure of Computation Dynamic Programming

© Daniel S. Weld 32 Variable Elimination A factor is a function from some set of variables into a specific value: e.g., f(E,A,N1) CPTs are factors, e.g., P(A|E,B) function of A,E,B VE works by eliminating all variables in turn until there is a factor with only query variable To eliminate a variable: join all factors containing that variable (like DB) sum out the influence of the variable on new factor exploits product form of joint distribution

© Daniel S. Weld 33 Example of VE: P(N1) Earthqk Burgl Alarm N2N1 P(N1) =  N2,A,B,E P(N1,N2,A,B,E) =  N2,A,B,E P(N1|A)P(N2|A) P(B)P(A|B,E)P(E) =  A P(N1|A)  N2 P(N2|A)  B P(B)  E P(A|B,E)P(E) =  A P(N1|A)  N2 P(N2|A)  B P(B) f1(A,B) =  A P(N1|A)  N2 P(N2|A) f2(A) =  A P(N1|A) f3(A) = f4(N1)

© Daniel S. Weld 34 Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor e.g., in example, 3 vars (not 5) linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) Practically, inference is much more tractable using structure of this sort