From Variable Elimination to Junction Trees

Slides:



Advertisements
Similar presentations
Constraint Satisfaction Problems
Advertisements

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
CS498-EA Reasoning in AI Lecture #15 Instructor: Eyal Amir Fall Semester 2011.
1 Chapter 5 Belief Updating in Bayesian Networks Bayesian Networks and Decision Graphs Finn V. Jensen Qunyuan Zhang Division. of Statistical Genomics,
. Exact Inference in Bayesian Networks Lecture 9.
Bayesian Networks, Winter Yoav Haimovitch & Ariel Raviv 1.
Lauritzen-Spiegelhalter Algorithm
Great Theoretical Ideas in Computer Science for Some.
Exact Inference in Bayes Nets
Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
Junction Tree Algorithm Brookes Vision Reading Group.
Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.
Junction tree Algorithm :Probabilistic Graphical Models Recitation: 10/04/07 Ramesh Nallapati.
CSCI 121 Special Topics: Bayesian Networks Lecture #3: Multiply-Connected Graphs and the Junction Tree Algorithm.
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
Great Theoretical Ideas in Computer Science.
Graphical Models - Inference -
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
1 Spanning Trees Lecture 20 CS2110 – Spring
. Bayesian Networks For Genetic Linkage Analysis Lecture #7.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Bayesian Networks Clique tree algorithm Presented by Sergey Vichik.
Belief Propagation, Junction Trees, and Factor Graphs
. Inference I Introduction, Hardness, and Variable Elimination Slides by Nir Friedman.
Exact Inference: Clique Trees
PGM 2002/03 Tirgul5 Clique/Junction Tree Inference.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
Copyright © Cengage Learning. All rights reserved.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
COSC 2007 Data Structures II Chapter 14 Graphs III.
TCP Traffic and Congestion Control in ATM Networks
Automated Planning and Decision Making Prof. Ronen Brafman Automated Planning and Decision Making 2007 Bayesian networks Variable Elimination Based on.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
UIUC CS 598: Section EA Graphical Models Deepak Ramachandran Fall 2004 (Based on slides by Eyal Amir (which were based on slides by Lise Getoor and Alvaro.
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Knowledge Representation & Reasoning Lecture #4 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
An Introduction to Variational Methods for Graphical Models
1 Bayesian Networks (Directed Acyclic Graphical Models) The situation of a bell that rings whenever the outcome of two coins are equal can not be well.
Intro to Junction Tree propagation and adaptations for a Distributed Environment Thor Whalen Metron, Inc.
1 Use graphs and not pure logic Variables represented by nodes and dependencies by edges. Common in our language: “threads of thoughts”, “lines of reasoning”,
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
On Distributing a Bayesian Network
Great Theoretical Ideas in Computer Science for Some.
Today Graphical Models Representing conditional dependence graphically
Belief propagation with junction trees Presented by Mark Silberstein and Yaniv Hamo.
Algorithms for hard problems Parameterized complexity Bounded tree width approaches Juris Viksna, 2015.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 15 th, 2008 Readings: K&F: 8.1, 8.2, 8.3,
. Bayesian Networks Some slides have been edited from Nir Friedman’s lectures which is available at Changes made by Dan Geiger.
Knowledge Representation & Reasoning Lecture #5 UIUC CS 498: Section EA Professor: Eyal Amir Fall Semester 2005 (Based on slides by Lise Getoor and Alvaro.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk
An introduction to chordal graphs and clique trees
Inference in Bayesian Networks
PGM 2003/04 Tirgul6 Clique/Junction Tree Inference
Great Theoretical Ideas in Computer Science
Lecture 12 Algorithm Analysis
Bell & Coins Example Coin1 Bell Coin2
Discrete Mathematics for Computer Science
Spanning Trees.
CSCI 5822 Probabilistic Models of Human and Machine Learning
CSE 373 Data Structures and Algorithms
Elimination in Chains A B C E D.
Lecture 12 Algorithm Analysis
Minimum Spanning Trees
Presentation transcript:

From Variable Elimination to Junction Trees Yaniv Hamo and Mark Silberstein

Variable Elimination – what is it and why we need it exists 0.1 not exists 0.9 R Reference exists not exists yes 0.8 0.4 no 0.2 0.6 S Submit HW yes no pass 0.9 0.5 fail 0.1 P Pass course Variable elimination is needed for answering questions such as “so, do I pass this course or not?”

So, do I pass this course or not? We want to compute P(p) By definition: In our case (chain): P(p) = 0.1*(0.8*0.9+0.2*0.5)+0.9*(0.4*0.9+0.6*0.5) = 0.676 We essentially eliminated nodes R and S

The General Case – Inference Network describes a unique probability distribution P We use inference as a name for the process of computing answers to queries about P There are many types of queries we might ask. Most of these involve evidence An evidence e is an assignment of values to a set E variables in the domain Without loss of generality E = { Xk+1, …, Xn } Simplest query: compute probability of evidence This is often referred to as computing the likelihood of the evidence

Another example of Variable Elimination The “Asia” network: Visit to Asia Smoking Lung Cancer Tuberculosis Abnormality in Chest Bronchitis X-Ray Dyspnea

We are interested in P(d) - Need to eliminate: v,s,x,t,l,a,b Initial factors: Brute force: V S L T A B X D

Eliminate variables in order: Initial factors: V S L T A B X D [ Note: fv(t) = P(t) In general, result of elimination is not necessarily a probability term ]

Eliminate variables in order: Initial factors: V S L T A B X D [ Note: result of elimination may be a function of several variables ]

Eliminate variables in order: Initial factors: V S L T A B X D [ Note: fx(a) = 1 for all values of a ]

Eliminate variables in order: Initial factors: V S L T A B X D

Eliminate variables in order: Initial factors: V S L T A B X D

Eliminate variables in order: Initial factors: V S L T A B X D

Eliminate variables in order: Initial factors: V S L T A B X D

Intermediate factors In our previous example: With a different ordering: V S L T A B X D Complexity is exponential in the size of these factors!

Notes about variable elimination Actual computation is done in the elimination steps Computation depends on the order of elimination For each query we need to compute everything again! Many redundant calculations

The idea Compute joint over partitions of U 􀂄 small subset of U (typically made of a variable and its parents) - clusters not necessary disjoint Calculate To compute P(X) need far less operations:

Junction Trees The junction tree algorithms generalize Variable Elimination to the efficient, simultaneous execution of a large class of queries. Theoretical background was shown in the previous lecture

Constructing Junction Trees Moralize the graph (if directed) Choose a node ordering and find the cliques generated by variable elimination. This gives a triangulation of the graph Build a junction graph from the eliminated cliques Find an appropriate spanning tree

Step 1: Moralization G = ( V , E ) GM 1. For all w  V: b c d e f g h a b c d e f g h a b c d e f g h G = ( V , E ) GM 1. For all w  V: • For all u,vpa(w) add an edge e=u-v. 2. Undirect all edges.

Step 2: Triangulation GM GT b c d e f g h GM GT Add edges to GM such that there is no cycle with length  4 that does not contain a chord. NO YES

Step 2: Triangulation (cont.) Each elimination ordering triangulates the graph, not necessarily in the same way: A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G A H B D F C E G A A A H B D F C E G B D F C E G B C D E F G H H

Step 2: Triangulation (cont.) Intuitively, triangulations with as few fill-ins as possible are preferred Leaves us with small cliques (small probability tables) A common heuristic: Repeat until no nodes remain: Find the node whose elimination would require the least number of fill-ins (may be zero). Eliminate that node, and note the need for a fill-in edge between any two non-adjacent neighbors. Add the fill-in edges to the original graph.

GT a b c d e f g h 1 h egh - 2 g ceg - 3 f def - 4 c ace a-e Eliminate the vertex that requires least number of edges to be added. a a b c d e f g a b c d e f b c g a b c d e f g h d e h f GM a b c d e a b d e a d e a e a GT vertex induced added removed clique edges 1 h egh - 2 g ceg - 3 f def - 4 c ace a-e removed clique edges 5 b abd a-d 6 d ade - 7 e ae - 8 a a -

Step 3: Junction Graph A junction graph for an undirected graph G is an undirected, labeled graph. The nodes are the cliques in G. If two cliques intersect, they are joined in the junction graph by an edge labeled with their intersection.

a b c d e f g h a b c d e f g h a b c d e f g h a b d c e f g h Bayesian Network G = ( V , E ) Moral graph GM Triangulated graph GT a b d c e f g h abd a ace ad ae ce ade e ceg e de e eg seperators def e egh Cliques e.g. ceg  egh = eg Junction graph GJ (not complete)

Step 4: Junction Tree A junction tree is a sub-graph of the junction graph that Is a tree Contains all the cliques (spanning tree) Satisfies the running intersection property: for each pair of nodes U, V, all nodes on the path between U and V contain (as seen in the previous part of the lecture)

Step 4: Junction Tree (cont.) Theorem: An undirected graph is triangulated if and only if its junction graph has a junction tree Definition: The weight of a link in a junction graph is the number of variable in the label. The weight of a junction tree is the sum of weights of the labels. Theorem: A sub-tree of the junction graph of a triangulated graph is a junction tree if and only if it is a spanning of maximal weight

Junction tree GJT There are several methods to find MST. Kruskal’s algorithm: choose successively a link of maximal weight unless it creates a cycle. abd ade ace ceg egh def ad ae ce de eg abd ade ace ceg egh def ad ae ce de eg e a Junction tree GJT Junction graph GJ (not complete)

Another example Compute the elimination cliques (the order here is f, d, e, c, b, a). Form the complete junction graph over the maximal elimination cliques and find a maximum-weight spanning tree.

Junction Trees and Elimination Order We can use different orderings in variable elimination - affects efficiency. Each ordering corresponds to a junction tree. Just as some elimination orderings are more efficient than others, some junction trees are better than others. (Recall our mention of heuristics for triangulation.)

OK, I have this tree, now what? L T A B X V S D A separator S divides the remaining variables into two groups Variables in each group appear on one side in the cluster tree T,V A,L,T B,L,S X,A A,L,B A,B,D A A,B B,L T A,L Examples: {A,B}: {L, S, T, V} & {D, X} {A,L}: {T, V} & {B,D,S,X} {B,L}: {S} & {A, D,T, V, X} {A}: {X} & {B,D,L, S, T, V} {T}; {V} & {A, B, D, K, S, X}

Elimination in Junction Trees Let X and Y be the partition induced by S Observation: Eliminating all variables in X results in a factor fX(S) Proof: Since S is a separator only variables in S are adjacent to variables in X Note:The same factor would result, regardless of the elimination order x y A B S fX(S) fY(S)

Recursive Elimination in Junction Trees How do we compute fX(S) ? By recursive decomposition along cluster tree Let X1 and X2 be the disjoint partitioning of X \ C implied by the separators S1 and S2 Eliminate X1 to get fX1(S1) Eliminate X2 to get fX2(S2) Eliminate variables in C \ S to get fX(S) C S S2 S1 x1 x2 y