Probabilistic Reasoning

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Bayesian Networks Chapter 14 Section 1, 2, 4. Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact.
CPSC 322, Lecture 26Slide 1 Reasoning Under Uncertainty: Belief Networks Computer Science cpsc322, Lecture 27 (Textbook Chpt 6.3) March, 16, 2009.
Probabilistic Reasoning Copyright, 1996 © Dale Carnegie & Associates, Inc. Chapter 14 (14.1, 14.2, 14.3, 14.4) Capturing uncertain knowledge Probabilistic.
Machine Learning II Ensembles & Cotraining CSE 573 Representing Uncertainty.
Bayesian networks Chapter 14 Section 1 – 2.
Bayesian Belief Networks
CS 188: Artificial Intelligence Fall 2009 Lecture 16: Bayes’ Nets III – Inference 10/20/2009 Dan Klein – UC Berkeley.
5/25/2005EE562 EE562 ARTIFICIAL INTELLIGENCE FOR ENGINEERS Lecture 16, 6/1/2005 University of Washington, Department of Electrical Engineering Spring 2005.
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
© Daniel S. Weld 1 Statistical Learning CSE 573 Lecture 16 slides which overlap fix several errors.
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Bayes’ Nets  A Bayes’ net is an efficient encoding of a probabilistic model of a domain  Questions we can ask:  Inference: given a fixed BN, what is.
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 13, 2012.
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
Introduction to Bayesian Networks
An Introduction to Artificial Intelligence Chapter 13 & : Uncertainty & Bayesian Networks Ramin Halavati
Announcements  Office hours this week Tuesday (as usual) and Wednesday  HW6 posted, due Monday 10/20.
Announcements Project 4: Ghostbusters Homework 7
Bayesian Networks CSE 473. © D. Weld and D. Fox 2 Bayes Nets In general, joint distribution P over set of variables (X 1 x... x X n ) requires exponential.
Advanced Artificial Intelligence Lecture 5: Probabilistic Inference.
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Chapter 12. Probability Reasoning Fall 2013 Comp3710 Artificial Intelligence Computing Science Thompson Rivers University.
Artificial Intelligence Bayes’ Nets: Independence Instructors: David Suter and Qince Li Course Harbin Institute of Technology [Many slides.
CS 188: Artificial Intelligence Spring 2007
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Inference in Bayesian Networks
Bayesian networks (1) Lirong Xia Spring Bayesian networks (1) Lirong Xia Spring 2017.
Read R&N Ch Next lecture: Read R&N
Learning Bayesian Network Models from Data
CS 4/527: Artificial Intelligence
Quizzz Rihanna’s car engine does not start (E).
CS 4/527: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
Still More Uncertainty
Read R&N Ch Next lecture: Read R&N
Structure and Semantics of BN
Advanced Artificial Intelligence
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
CS 188: Artificial Intelligence
CAP 5636 – Advanced Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2007
CS 188: Artificial Intelligence Fall 2008
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Class #16 – Tuesday, October 26
Announcements Midterm: Wednesday 7pm-9pm
CS 188: Artificial Intelligence Spring 2007
Structure and Semantics of BN
Bayesian networks Chapter 14 Section 1 – 2.
Probabilistic Reasoning
CS 188: Artificial Intelligence Spring 2006
Read R&N Ch Next lecture: Read R&N
Bayesian Networks CSE 573.
Bayesian networks (2) Lirong Xia. Bayesian networks (2) Lirong Xia.
Bayesian networks (1) Lirong Xia. Bayesian networks (1) Lirong Xia.
CS 188: Artificial Intelligence Fall 2008
Representing Uncertainty
Bayesian networks (2) Lirong Xia.
Presentation transcript:

Probabilistic Reasoning

Power of Cond. Independence Often, using conditional independence reduces the storage complexity of the joint distribution from exponential to linear!! Conditional independence is the most basic & robust form of knowledge about uncertain environments. © Daniel S. Weld

Bayes Rule Simple proof from def of conditional probability: QED: (Def. cond. prob.) (Def. cond. prob.) قانون بیز (Mult by P(H) in line 1) QED: (Substitute #3 in #2) © Daniel S. Weld

Use to Compute Diagnostic Probability from Causal Probability E.g. let M be meningitis, S be stiff neck P(M) = 0.0001, P(S) = 0.1, P(S|M)= 0.8 P(M|S) = © Daniel S. Weld

Bayes’ Rule & Cond. Independence © Daniel S. Weld

Bayes Nets In general, joint distribution P over set of variables (X1 x ... x Xn) requires exponential space for representation & inference BNs provide a graphical representation of conditional independence relations in P usually quite compact requires assessment of fewer parameters, those being quite natural (e.g., causal) efficient (usually) inference: query answering and belief update © Daniel S. Weld

BN What do the arrows really mean? Topology may happen to encode causal structure Topology only guaranteed to encode conditional independence © Daniel S. Weld

P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn) Independence If X1, X2,... Xn are mutually independent, then P(X1, X2,... Xn ) = P(X1)P(X2)... P(Xn) Joint can be specified with n parameters cf. the usual 2n-1 parameters required While extreme independence is unusual, Conditional independence is common BNs exploit this conditional independence © Daniel S. Weld

An Example Bayes Net P(E) 0.002 Earthquake Burglary Alarm Nbr1Calls Pr(B=t)) 0.001 Earthquake Burglary Pr(A|E,B) e,b .95 e,b .29 e,b .94 e,b 0.01 Alarm Nbr1Calls Nbr2Calls A P(N1) T 0.9 F 0.05 A P(N2) T 0.7 F 0.01 © Daniel S. Weld

Earthquake Example (con’t) Burglary Alarm Nbr2Calls Nbr1Calls If I know if Alarm, no other evidence influences my degree of belief in Nbr1Calls P(N1|N2,A,E,B) = P(N1|A) also: P(N2|N2,A,E,B) = P(N2|A) and P(E|B) = P(E) By the chain rule we have P(N1,N2,A,E,B) = P(N1|N2,A,E,B) ·P(N2|A,E,B)· P(A|E,B) ·P(E|B) ·P(B) = P(N1|A) ·P(N2|A) ·P(A|B,E) ·P(E) ·P(B) Full joint requires only 10 parameters (cf. 32) 10 = 2+2+4+1+1 © Daniel S. Weld

BNs: Qualitative Structure Graphical structure of BN reflects conditional independence among variables Each variable X is a node in the DAG Edges denote direct probabilistic influence usually interpreted causally parents of X are denoted Par(X) X is conditionally independent of all nondescendents given its parents Graphical test exists for more general independence “Markov Blanket” © Daniel S. Weld

Given Parents, X is Independent of Non-Descendants © Daniel S. Weld

For Example Earthquake Burglary Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

Given Markov Blanket, X is Independent of All Other Nodes MB(X) = Par(X)  Childs(X)  Par(Childs(X)) © Daniel S. Weld

Conditional Probability Tables Pr(B=t) Pr(B=f) 0.05 0.95 Earthquake Burglary Pr(A|E,B) e,b 0.9 (0.1) e,b 0.2 (0.8) e,b 0.85 (0.15) e,b 0.01 (0.99) Radio Alarm Nbr1Calls Nbr2Calls © Daniel S. Weld

Conditional Probability Tables For complete spec. of joint dist., quantify BN For each variable X, specify CPT: P(X | Par(X)) number of params locally exponential in |Par(X)| If X1, X2,... Xn is any topological sort of the network, then we are assured: P(Xn,Xn-1,...X1) = P(Xn| Xn-1,...X1)·P(Xn-1 | Xn-2,… X1) … P(X2 | X1) · P(X1) = P(Xn| Par(Xn)) · P(Xn-1 | Par(Xn-1)) … P(X1) © Daniel S. Weld

Bayes Nets Representation Summary Bayes nets compactly encode joint distributions Guaranteed independencies of distributions can be deduced from BN graph structure D-separation gives precise conditional independence guarantees from graph alone A Bayes’ net’s joint distribution may have further (conditional) independence that is not detectable until you inspect its specific distribution © Daniel S. Weld

Inference in BNs Inference: calculating some useful quantity from a joint probability distribution The graphical independence representation yields efficient inference schemes Posterior probability: Most likely explanation: Computations organized by network topology © Daniel S. Weld

P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a) P(B | J=true, M=true) Earthquake Burglary Radio Alarm John Mary P(b|j,m) = P(b) P(e) P(a|b,e)P(j|a)P(m,a) e a © Daniel S. Weld

Inference by Enumeration Given unlimited time, inference in BNs is easy Recipe: State the marginal probabilities you need Figure out ALL the atomic probabilities you need Calculate and combine them E.g. © Daniel S. Weld

In this simple method, we only need the BN to synthesize the joint entries © Daniel S. Weld

Inference by Enumeration? © Daniel S. Weld

Variable Elimination Why is inference by enumeration so slow? You join up the whole joint distribution before you sum out the hidden variables You end up repeating a lot of work! Idea: interleave joining and marginalizing! Called “Variable Elimination” Still NP-hard, but usually much faster than inference by enumeration We’ll need some new notation to define VE 2 © Daniel S. Weld

Factor 1 Joint distribution: P(X,Y) Selected joint: P(x,Y) Entries P(x,y) for all x, y Sums to 1 Selected joint: P(x,Y) A slice of the joint distribution Entries P(x,y) for fixed x, all y Sums to P(x) © Daniel S. Weld

Factor 2 Family of conditionals: P(X |Y) Single conditional: P(Y | x) Multiple conditionals Entries P(x | y) for all x, y Sums to |Y| Single conditional: P(Y | x) Entries P(y | x) for fixed x, all y Sums to 1 © Daniel S. Weld

Specified family: P(y | X) Entries P(y | x) for fixed y, but for all x Sums to … who knows! In general, when we write P(Y1 … YN | X1 … XM) It is a “factor,” a multi-dimensional array Its values are all P(y1 … yN | x1 … xM) Any assigned X or Y is a dimension missing (selected) from the array © Daniel S. Weld

Example: Traffic Domain Random Variables R: Raining T: Traffic L: Late for class! First query: P(L) © Daniel S. Weld

Operation 1: Join Factors First basic operation: joining factors Combining factors: Just like a database join Get all factors over the joining variable Build a new factor over the union of the variables involved Example: Join on R Computation for each entry: pointwise products © Daniel S. Weld

Example: Multiple Joins © Daniel S. Weld

© Daniel S. Weld

Operation 2: Eliminate Second basic operation: marginalization Take a factor and sum out a variable Shrinks a factor to a smaller one A projection operation Example: © Daniel S. Weld

Multiple Elimination © Daniel S. Weld

P(L) : Marginalizing Early! © Daniel S. Weld

Marginalizing Early-2 © Daniel S. Weld

Evidence If evidence, start with factors that select that evidence No evidence uses these initial factors: Computing P(L|+r) , the initial factors become: We eliminate all vars other than query + evidence © Daniel S. Weld

Evidence II esult will be a selected joint of query and evidence E.g. for P(L | +r), we’d end up with: To get our answer, just normalize this © Daniel S. Weld

General Variable Elimination Query: Start with initial factors: Local CPTs (but instantiated by evidence) While there are still hidden variables (not Q or evidence): Pick a hidden variable H Join all factors mentioning H Eliminate (sum out) H Join all remaining factors and normalize © Daniel S. Weld

Variable Elimination Bayes Rule © Daniel S. Weld

Example 2 © Daniel S. Weld

© Daniel S. Weld

Notes on VE Each operation is a simply multiplication of factors and summing out a variable Complexity determined by size of largest factor linear in number of vars, exponential in largest factorelimination ordering greatly impacts factor size optimal elimination orderings: NP-hard heuristics, special structure (e.g., polytrees) On tree-structured graphs, variable elimination runs in polynomial time, like tree-structured CSPs Practically, inference is much more tractable using structure of this sort © Daniel S. Weld