Exact Inference ..

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
. Exact Inference in Bayesian Networks Lecture 9.
Lauritzen-Spiegelhalter Algorithm
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Complexity ©D Moshkovitz 1 Approximation Algorithms Is Close Enough Good Enough?
CS774. Markov Random Field : Theory and Application Lecture 17 Kyomin Jung KAIST Nov
An Introduction to Variational Methods for Graphical Models.
Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
From Variable Elimination to Junction Trees
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Bayesian network inference
1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Recent Development on Elimination Ordering Group 1.
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
Belief Propagation, Junction Trees, and Factor Graphs
Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.
. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Fixed Parameter Complexity Algorithms and Networks.
APPROXIMATION ALGORITHMS VERTEX COVER – MAX CUT PROBLEMS
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
Daphne Koller Variable Elimination Graph-Based Perspective Probabilistic Graphical Models Inference.
Probabilistic Networks Chapter 14 of Dechter’s CP textbook Speaker: Daniel Geschwender April 1, 2013 April 1&3, 2013DanielG--Probabilistic Networks1.
1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Oct, 2, 2015.
Probabilistic Graphical Models seminar 15/16 ( ) Haim Kaplan Tel Aviv University.
Computing Branchwidth via Efficient Triangulations and Blocks Authors: F.V. Fomin, F. Mazoit, I. Todinca Presented by: Elif Kolotoglu, ISE, Texas A&M University.
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Today Graphical Models Representing conditional dependence graphically
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
Inference in Bayesian Networks
PGM 2003/04 Tirgul6 Clique/Junction Tree Inference
Today.
Hard Problems Introduction to NP
Exact Inference ..
Data Mining Lecture 11.
Exact Inference Continued
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
CSCI 5822 Probabilistic Models of Human and Machine Learning
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 K&F: 7 (overview of inference) K&F: 8.1, 8.2 (Variable Elimination) Structure Learning in BNs 3: (the good,
An Introduction to Variational Methods for Graphical Models
NP-Complete Problems.
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11
Class #19 – Tuesday, November 3
Exact Inference Eric Xing Lecture 11, August 14, 2010
Exact Inference Continued
Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.
Inference III: Approximate Inference
Lecture 3: Exact Inference in GMs
Variable Elimination Graphical Models – Carlos Guestrin
Advanced Machine Learning
Presentation transcript:

Exact Inference .

Belief Update in Trees (Markov or Bayesian networks alike) x

Belief Update in Poly-Trees

Belief Update in Poly-Trees

Update all variables given the evidence Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Can we compute them all at the cost of computing one term twice? p(t|v) V L T A B X D p(x|a) p(d|a,b) p(a|t,l) p(v) p(l) p(b) Note that: P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a) g1(t,v) V L T A B X D p(x|a) g3(d,a,b) g2(a,t,l) The Moral Graph

Update all variables given the evidence Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Can we compute them all at the cost of computing one term twice? Recall that: P(v,…,x) = P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a) D,A,B A,T,L X,A T,V a,l t,l v x b,d A T Solution: Keep all partial sums on the links in both directions (as done in HMMs). Messages sent inwards first.

Computing A posteriori Belief in General Bayesian Networks Input: A Bayesian network, a set of nodes E with evidence E=e, an ordering x1,…,xm of all variables not in E. Output: P(x1,e) for every value x1 of X1. {from which p(x1|e) is available} The query: Set the evidence in all local probability tables that are defined over some variables from E. Iteratively (in some “optimal or good” order) Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product

Belief Update Suppose we get evidence D = do We wish to compute P(l,do) for every value l of L. T L A X D Good summation order (variable A is summed last): P(l, do) = a,t,x P(a,t,x,l,do) = a p(a) p(l|a) p(do|a) t p(t|a) x p(x|a) Bad summation order (variable A is summed first): P(l, do) = a,t,x P(a,t,x,l,do) = x t a p(a) p(l|a) p(do|a) p(t|a) p(x|a) Yields a high dimensional temporary table How to choose a reasonable order ?

A Graph-Theoretic View Eliminating vertex v from an undirected graph G – the process of making NG(v) a clique and removing v and its incident edges from G. NG(v) is the set of vertices that are adjacent to v in G. Elimination sequence of G – an order of all vertices.

Treewidth The width w of an elimination sequence s is the size of the largest clique (minus 1) being formed in the elimination process, namely, ws = maximumv|NG(v)|. The treewidth tw of a graph G is the minimum width among all elimination sequences, namely, tw=minimums ws Examples. All trees have tw = 1, All graphs with isolated cycles have tw = 2, cliques of size n have tw=n-1. Examples. Chordal graphs have tw equal to the size of their largest clique (minus 1).

Observations Theorem. Computing a posteriori probability in a Markov graph G or a BN for which G is the moral graph is at most exponential in the width of the elimination sequence used. Theorem. Computing a posteriori probability in chordal graphs is polynomial in the size of the input (namely, the largest clique). Theorem. Finding an elimination sequence that produces the treewidth or more precisely just finding if tw = c is NP-hard. Simple heuristic. At each step eliminate a vertex v that produces the smallest clique, namely, minimizes |NG(v)|.

Results about treewidth Theorem(s). There are several algorithms that produce treewidth tw with a small constant factor error a at time complexity of Poly(n)ctw. where c is a constant and n is the number of vertices. Comment. One such algorithm will be presented next week by a student. Observation. The above theorem is “practical” if the constants a and c are low enough because inference also requires complexity of at most Poly(n)ktw where k is the size of the largest domain. Observation. There are other cost functions for optimizing complexity that take number of states into account.

Update all variables in a general network Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Can we still compute them all at the cost of computing one term twice? Note that: P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a)g5(l,b,s) p(t|v) V S L T A B X D p(x|a) p(d|a,b) p(a|t,l) p(b|s) p(l|s) p(s) p(v) g1(t,v) V S L T A B X D g4(x,a) g3(d,a,b) g2(a,t,l) g5(l,s,b)

Update all variables given the evidence Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Recall that: P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a)g5(l,b,s) d a D,A,B A,T,L X,A T,V v x b,d A T a,l t,l A,L,B L,B,S l s Solution: Keep all partial sums on the links in both directions (as done in HMMs). Messages sent inwards first.

Global conditioning Fixing value of A & B L C I J M E K D a b A L C I This transformation yields an I-map of Prob(a,b,C,D…) for fixed values of A and B. Fixing values in the beginning of the summation can decrease tables formed by variable elimination. This way space is traded with time. Special case: choose to fix a set of nodes that “break all loops”. This method is called cutset-conditioning.

Cuteset conditioning A B Fixing value of A & B & L breaks all loops. But can we choose less variables to break all loops ? Are there better variables to choose than others ? This optimization question translates to the well known FVS problem: Choose a set of variables of least weight that lie on every cycle of a given weighted undirected graph G. C I D E K J L M

The Noisy Or-Gate Model

Approximate Inference Gibbs sampling Loopy belief propagation Bounded conditioning Likelihood Weighting Variational methods