Inference in Bayesian Networks

Slides:



Advertisements
Similar presentations
Bayesian networks Chapter 14 Section 1 – 2. Outline Syntax Semantics Exact computation.
Advertisements

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.
. Exact Inference in Bayesian Networks Lecture 9.
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
From Variable Elimination to Junction Trees
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graph.
Bayesian Networks. Motivation The conditional independence assumption made by naïve Bayes classifiers may seem to rigid, especially for classification.
Recent Development on Elimination Ordering Group 1.
Bayesian Belief Networks
Bayesian Networks. Graphical Models Bayesian networks Conditional random fields etc.
Belief Propagation, Junction Trees, and Factor Graphs
CS 188: Artificial Intelligence Spring 2007 Lecture 14: Bayes Nets III 3/1/2007 Srini Narayanan – ICSI and UC Berkeley.
Bayesian networks practice. Semantics e.g., P(j  m  a   b   e) = P(j | a) P(m | a) P(a |  b,  e) P(  b) P(  e) = … Suppose we have the variables.
CS 188: Artificial Intelligence Fall 2006 Lecture 17: Bayes Nets III 10/26/2006 Dan Klein – UC Berkeley.
A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.
Bayesian Reasoning. Tax Data – Naive Bayes Classify: (_, No, Married, 95K, ?)
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Bayesian networks Chapter 14. Outline Syntax Semantics.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Bayesian Networks What is the likelihood of X given evidence E? i.e. P(X|E) = ?
Bayesian networks. Motivation We saw that the full joint probability can be used to answer any question about the domain, but can become intractable as.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Announcements  Office hours this week Tuesday (as usual) and Wednesday  HW6 posted, due Monday 10/20.
Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Lecture 29 Conditional Independence, Bayesian networks intro Ch 6.3, 6.3.1, 6.5, 6.5.1,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Inference Algorithms for Bayes Networks
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Introduction on Graphic Models
Today Graphical Models Representing conditional dependence graphically
Bayesian networks Chapter 14 Slide Set 2. Constructing Bayesian networks 1. Choose an ordering of variables X 1, …,X n 2. For i = 1 to n –add X i to the.
Conditional Independence As with absolute independence, the equivalent forms of X and Y being conditionally independent given Z can also be used: P(X|Y,
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
CS 2750: Machine Learning Bayesian Networks Prof. Adriana Kovashka University of Pittsburgh March 14, 2016.
Integrative Genomics I BME 230. Probabilistic Networks Incorporate uncertainty explicitly Capture sparseness of wiring Incorporate multiple kinds of data.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
CS 2750: Machine Learning Directed Graphical Models
Bayesian networks Chapter 14 Section 1 – 2.
Presented By S.Yamuna AP/CSE
Qian Liu CSE spring University of Pennsylvania
Artificial Intelligence
CS 4/527: Artificial Intelligence
Quizzz Rihanna’s car engine does not start (E).
Bayesian Networks Probability In AI.
CS 4/527: Artificial Intelligence
Exact Inference Continued
CAP 5636 – Advanced Artificial Intelligence
CSCI 5822 Probabilistic Models of Human and Machine Learning
Inference Inference: calculating some useful quantity from a joint probability distribution Examples: Posterior probability: Most likely explanation: B.
Instructors: Fei Fang (This Lecture) and Dave Touretzky
CAP 5636 – Advanced Artificial Intelligence
Exact Inference ..
CS 188: Artificial Intelligence
Class #19 – Tuesday, November 3
CS 188: Artificial Intelligence
Exact Inference Continued
CS 188: Artificial Intelligence Fall 2008
Announcements Midterm: Wednesday 7pm-9pm
Bayesian networks Chapter 14 Section 1 – 2.
Variable Elimination Graphical Models – Carlos Guestrin
Probabilistic Reasoning
Presentation transcript:

Inference in Bayesian Networks BMI/CS 576 www.biostat.wisc.edu/bmi576/ Colin Dewey cdewey@biostat.wisc.edu Fall 2008

The Inference Task in Bayesian Networks Given: values for some variables in the network (evidence), and a set of query variables Do: compute the posterior distribution over the query variables variables that are neither evidence variables nor query variables are hidden variables the BN representation is flexible enough that any set can be the evidence variables and any set can be the query variables

Inference by Enumeration suppose we have the following BN with 5 Boolean variables we use a to denote A=true, and a to denote A=false suppose we’re given the query: P(a | d, e) “probability a is true given that d and e are true” from the graph structure we can compute this by: sum over possible values for B and C variables normalizing constant C A B E D

Inference by Enumeration P(A) 0.1 P(B) 0.1 A B C D E b, c C A B E D b, c A B P(C) T 0.8 F 0.5 0.2 0.7  b, c  b,  c C P(D) T 0.4 F 0.3 C P(E) T 0.9 F 0.2

Inference by Enumeration now do equivalent calculation for P(a | d, e) and normalize

Exact Inference inference by enumeration is an exact method (i.e. it computes the exact answer to a given query) it requires summing over a joint distribution whose size is exponential in the number of variables can we do exact inference efficiently in large networks? in many cases, yes key insight: save computation by pushing sums inward

Potentials we can represent a CPT or a full joint distribution in this format d e f .8  f .2  e .5  d .7 .3 D E P(F) T 0.8 F 0.5 0.2 0.7  when we (temporarily) ignore meaning of table, we call it a potential: a table of non-negative values with no further constraints

Multiplying Potentials domain of (variables in) result is the union of domains of input potentials for each cell in result, multiply all input cells that agree on variable settings a b c .02  c .04  b .06 .10  a .20 .24 .40 a b .1  b .2  a .5 .8 b c .2  c .4  b .3 .5 × = a b .1  b .5  a .4 a b .8  b .7  a .9 a b .08  b .35  a .36 × =

Marginalizing and Normalizing Potentials can also marginalize (sum out a variable) potentials a b .1  b .5  a .2 .7  a .6  a .9 and normalize them a b .1  b .5  a .2 .7 a b .067  b .333  a .133 .467 

Key Observation   × = × = if A is not in P2 c a .01  a .02  c .04 .08  b .06 .09 .03 a b .1  b .2  a .3 b c .1  c .4  b .3 b c .03  c .12  b .15 .05 × =  can marginalize (sum out A) before multiplying in this case, resulting in a smaller intermediate table a b .1  b .2  a .3 b c .1  c .4  b .3 b c .03  c .12  b .15 .05  b .3  b .5 × =

Variable Elimination consider answering the query P(D) with following network this entails computing but we can do this more efficiently by pushing the sums in C A B D

Variable Elimination Procedure given: Bayes net, evidence for variables in V, query variable Q initial potentials are the CPTs repeat until only query variable remains choose a variable X to eliminate multiply all potentials that contain X if X is not in V sum X out; replace potential by new result else remove X based on evidence multiply remaining potentials for Q normalize remaining potential to get final distribution over Q

Variable Elimination Example suppose the query is P(F | c) let’s eliminate variables in order A, B, C, D, E P(A) 0.2 C A B E D F A P(B) T 0.8 F 0.3 A P(C) T 0.1 F 0.6 B P(D) T 0.4 F 0.7 C P(E) T 0.5 F 0.9 D E P(F) T 0.8 F 0.5 0.2 0.7

Variable Elimination: Eliminating A 1. multiply all potentials that involve A 2. eliminate A by summing out b c a .016  a .144  c .096  b .004 .336 .036 .224 a b .8  b .2  a .3 .7 a c .1  c .9  a .6 .4 a .2  a .8 × × = A B C b c .16  c .24  b .34 .26 D E F

Variable Elimination: Eliminating B 1. multiply potentials involving B 2. eliminate B by summing out c d b .064  b .238  d .096 .102  c .182 .144 .078 b d .4  d .6  b .7 .3 b c .16  c .24  b .34 .26 × = A B C c d .302  d .198  c .278 .222 D E F

Variable Elimination: Eliminating C 1. multiply potentials involving C c d e .151  e  d .099  c .250 .028 .200 .022 c d .302  d .198  c .278 .222 c e .5  e  c .9 .1 × = 2. we have evidence for C, so eliminate values corresponding to c A B C D E F

Variable Elimination: Eliminating D 1. multiply potentials involving D 2. sum out D e f d .1  d .4  f .9 .6  e .3 .7 e f d .015  d .040  f .136 .059  e .045 .089 .106 .010 d e .151  e  d .099 × = A B C D E e f .055  f .195  e .134 .116 F

Variable Elimination: Eliminating E 1. eliminate E e f .055  f .195  e .134 .116  f .189  f .311  A 2. normalize to get final answer B C f .378  f .622 D E F

Variable Elimination Inference by Enumeration multiply all CPTs in Bayes net (any order) eliminate some cells based on evidence marginalize down to query variable (sum out others in any order) Variable Elimination multiply CPTs, but sum out a variable once we’ve included all tables that use it i.e. push the sums in as far as we can

Variable Elimination as a Graph Transformation initial graph is undirected version of Bayes net when we eliminate a variable X, we create a potential that contains X and the set of variables Y it appears with in potentials eliminating X transforms graph by adding edges between all variables in Y A B C for the ordering A, B, C, D, E we get the dashed edges added D E F

Variable Elimination as a Graph Transformation the cliques of this graph correspond to the intermediate result potentials we calculated A c d b .064  b .238  d .096 .102  c .182 .144 .078 B C D E F the size of the potentials correspond to the size of the cliques

Choosing the Elimination Ordering keep in mind this induced graph depends on the ordering the time complexity is dominated by the largest potential we have to compute (largest clique in induced graph) therefore, we want to find an ordering that leads to an induced graph with with a small largest clique finding the optimal ordering is NP-hard, but there are some reasonable heuristics

Greedy Elimination Ordering given: a Bayes net represented as an undirected graph G, an evaluation function f initialize all nodes in G as unmarked for k = 1 to number of nodes select unmarked variable X that minimizes f(G,X) π(X) = k add edges in G between all unmarked neighbors of X mark X return: variable ordering π one reasonable choice for f is min-fill: the number of edges that need to be added to the graph due to its elimination

Comments on BN Inference in general, the Bayes net inference problem is NP-hard (or harder) exact inference is tractable for many real-world graphical models, however there are many methods for approximate inference – get an answer which is “close” in general, the approximate inference problem is NP-hard as well approximate inference works well for many real-world graphical models, however