Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exact Inference ..

Similar presentations


Presentation on theme: "Exact Inference .."— Presentation transcript:

1 Exact Inference .

2 Belief Update in Trees (Markov or Bayesian networks alike)
x

3 Belief Update in Poly-Trees

4 Belief Update in Poly-Trees

5 Update all variables given the evidence
Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Can we compute them all at the cost of computing one term twice? p(t|v) V L T A B X D p(x|a) p(d|a,b) p(a|t,l) p(v) p(l) p(b) Note that: P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a) g1(t,v) V L T A B X D p(x|a) g3(d,a,b) g2(a,t,l) The Moral Graph

6 Update all variables given the evidence
Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Can we compute them all at the cost of computing one term twice? Recall that: P(v,…,x) = P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a) D,A,B A,T,L X,A T,V a,l t,l v x b,d A T Solution: Keep all partial sums on the links in both directions (as done in HMMs). Messages sent inwards first.

7 Computing A posteriori Belief in General Bayesian Networks
Input: A Bayesian network, a set of nodes E with evidence E=e, an ordering x1,…,xm of all variables not in E. Output: P(x1,e) for every value x1 of X1. {from which p(x1|e) is available} The query: Set the evidence in all local probability tables that are defined over some variables from E. Iteratively (in some “optimal or good” order) Move all irrelevant terms outside of innermost sum Perform innermost sum, getting a new term Insert the new term into the product

8 Belief Update Suppose we get evidence D = do
We wish to compute P(l,do) for every value l of L. T L A X D Good summation order (variable A is summed last): P(l, do) = a,t,x P(a,t,x,l,do) = a p(a) p(l|a) p(do|a) t p(t|a) x p(x|a) Bad summation order (variable A is summed first): P(l, do) = a,t,x P(a,t,x,l,do) = x t a p(a) p(l|a) p(do|a) p(t|a) p(x|a) Yields a high dimensional temporary table How to choose a reasonable order ?

9 A Graph-Theoretic View
Eliminating vertex v from an undirected graph G – the process of making NG(v) a clique and removing v and its incident edges from G. NG(v) is the set of vertices that are adjacent to v in G. Elimination sequence of G – an order of all vertices.

10 Treewidth The width w of an elimination sequence s is the size of
the largest clique (minus 1) being formed in the elimination process, namely, ws = maximumv|NG(v)|. The treewidth tw of a graph G is the minimum width among all elimination sequences, namely, tw=minimums ws Examples. All trees have tw = 1, All graphs with isolated cycles have tw = 2, cliques of size n have tw=n-1. Examples. Chordal graphs have tw equal to the size of their largest clique (minus 1).

11 Observations Theorem. Computing a posteriori probability in a Markov graph G or a BN for which G is the moral graph is at most exponential in the width of the elimination sequence used. Theorem. Computing a posteriori probability in chordal graphs is polynomial in the size of the input (namely, the largest clique). Theorem. Finding an elimination sequence that produces the treewidth or more precisely just finding if tw = c is NP-hard. Simple heuristic. At each step eliminate a vertex v that produces the smallest clique, namely, minimizes |NG(v)|.

12 Results about treewidth
Theorem(s). There are several algorithms that produce treewidth tw with a small constant factor error a at time complexity of Poly(n)ctw. where c is a constant and n is the number of vertices. Comment. One such algorithm will be presented next week by a student. Observation. The above theorem is “practical” if the constants a and c are low enough because inference also requires complexity of at most Poly(n)ktw where k is the size of the largest domain. Observation. There are other cost functions for optimizing complexity that take number of states into account.

13 Update all variables in a general network
Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Can we still compute them all at the cost of computing one term twice? Note that: P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a)g5(l,b,s) p(t|v) V S L T A B X D p(x|a) p(d|a,b) p(a|t,l) p(b|s) p(l|s) p(s) p(v) g1(t,v) V S L T A B X D g4(x,a) g3(d,a,b) g2(a,t,l) g5(l,s,b)

14 Update all variables given the evidence
Desired Query (update all beliefs): P(V|d0),…,P(X|D=d0) ? Recall that: P(v,…,x) = g1(t,v) g2(a,t,l) g3(d,a,b) g4(x,a)g5(l,b,s) d a D,A,B A,T,L X,A T,V v x b,d A T a,l t,l A,L,B L,B,S l s Solution: Keep all partial sums on the links in both directions (as done in HMMs). Messages sent inwards first.

15 Global conditioning Fixing value of A & B L C I J M E K D a b A L C I
This transformation yields an I-map of Prob(a,b,C,D…) for fixed values of A and B. Fixing values in the beginning of the summation can decrease tables formed by variable elimination. This way space is traded with time. Special case: choose to fix a set of nodes that “break all loops”. This method is called cutset-conditioning.

16 Cuteset conditioning A B Fixing value of A & B & L breaks all loops. But can we choose less variables to break all loops ? Are there better variables to choose than others ? This optimization question translates to the well known FVS problem: Choose a set of variables of least weight that lie on every cycle of a given weighted undirected graph G. C I D E K J L M

17 The Noisy Or-Gate Model

18 Approximate Inference
Gibbs sampling Loopy belief propagation Bounded conditioning Likelihood Weighting Variational methods


Download ppt "Exact Inference .."

Similar presentations


Ads by Google