Download presentation
Presentation is loading. Please wait.
1
Global Approximate Inference Eran Segal Weizmann Institute
2
General Approximate Inference Strategy Define a class of simpler distributions Q Search for a particular instance in Q that is “close” to P Answer queries using inference in Q
3
Cluster Graph A cluster graph K for factors F is an undirected graph Nodes are associated with a subset of variables C i U The graph is family preserving: each factor F is associated with one node C i such that Scope[ ] C i Each edge C i –C j is associated with a sepset S i,j = C i C j A cluster tree over factors F that satisfies the running intersection property is called a clique tree
4
Clique Tree Inference C D I SG L J H Verify: Tree and family preserving Running intersection property C,DG,I,D D G,S,IG,J,S,LH,G,J G,IG,SG,J P(C) P(D|C) P(G|I,D) P(I) P(S|I) P(L|G) P(J|L,S) P(H|G,J) 1 2345
5
Message Passing: Belief Propagation Initialize the clique tree For each clique C i set For each edge C i —C j set While unset cliques exist Select C i —C j Send message from C i to C j Marginalize the clique over the sepset Update the belief at C j Update the sepset at C i –C j
6
Clique Tree Invariant Belief propagation can be viewed as reparameterizing the joint distribution Upon calibration we showed Initially this invariant holds since At each update step invariant is also maintained Message only changes i and i,j so most terms remain unchanged We need to show But this is exactly the message passing step Belief propagation reparameterizes P at each step
7
Global Approximate Inference Inference as optimization Generalized Belief Propagation Define algorithm Constructing cluster graphs Analyze approximation guarantees Propagation with approximate messages Factorized messages Approximate message propagation Structured variational approximations
8
The Energy Functional Suppose we want to approximate P with Q Represent P by factors Define the energy functional Then: Minimizing D(Q||P F ) is equivalent to maximizing F[P F,Q] lnZ F[P F,Q] (since D(Q||P F ) 0)
9
Inference as Optimization We show that inference can be viewed as maximizing the energy functional F[P F,Q] Define a distribution Q over clique potentials Transform F[P F,Q] to an equivalent factored form F’[P F,Q] Show that if Q maximizes F’[P F,Q] subject to constraints in which Q represents calibrated potentials, then there exists factors that satisfy the inference message passing equations
10
Defining Q Recall that throughout BP Define Q as reparameterization of P such that Since D(Q||P F )=0 we show that calibrating Q is equivalent to maximizing F[P F,Q]
11
Factored Energy Functional Define the factored energy functional as Theorem: if Q is a set of calibrated potentials for T, then F[P F,Q]=F’[P F,Q]
12
Inference as Optimization Optimization task Find Q that maximizes F’[P F,Q] subject to Theorem: fixed points of Q for the above optimization Suggests iterative optimization procedure Identical to belief propagation!
13
Global Approximate Inference Inference as optimization Generalized Belief Propagation Define algorithm Constructing cluster graphs Analyze approximation guarantees Propagation with approximate messages Factorized messages Approximate message propagation Structured variational approximations
14
Generalized Belief Propagation Perform belief propagation in a cluster graph with loops Strategy: C A BD Bayesian network A,B,D B,C,D B,D Cluster tree A,B B,C B Cluster graph A,D C,D D C A
15
Generalized Belief Propagation Perform belief propagation in a cluster graph with loops Strategy: A,B B,C B Cluster graph A,D C,D D C A Inference may be incorrect: double counting evidence Unlike in BP on trees: Convergence is not guaranteed Potentials in calibrated tree are not guaranteed to be marginals in P
16
Generalized Belief Propagation Perform belief propagation in a cluster graph with loops Strategy: A,B B,C B Cluster graph A,D C,D D C A
17
Generalized Cluster Graph A cluster graph K for factors F is an undirected graph Nodes are associated with a subset of variables C i U The graph is family preserving: each factor F is associated with one node C i such that Scope[ ] C i Each edge C i –C j is associated with a sepset S i,j = C i C j A generalized cluster graph K for factors F is an undirected graph Nodes are associated with a subset of variables C i U The graph is family preserving: each factor F is associated with one node C i such that Scope[ ] C i Each edge C i –C j is associated with a subset S i,j C i C j
18
Generalized Cluster Graph A generalized cluster graph obeys the running intersection property if for each X C i and X C j, there is exactly one path between C i and C j for which X S for each subset S along the path All edges associated with X form a tree that spans all the clusters that contain X Note: some of these clusters may be connected with more than one path A,B B,C B A,D C,D D C A
19
Calibrated Cluster Graph A generalized cluster graph is calibrated if for each edge C i – C j we have: Weaker than in clique trees, since S i,j is a subset of the intersection between C i and C j If a cluster graph satisfies the running intersection property, then the marginal on any variable X is the same in every cluster that contains X
20
GBP is Efficient X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 X 11,X 12 X 12 X 12,X 13 X 12,X 22 X 13,X 23 X 11,X 21 X 21,X 22 X 22,X 23 X 22,X 32 X 23,X 33 X 21,X 31 X 31,X 32 X 32,X 33 X 11 X 21 X 31 X 32 X 33 X 23 X 22 X 23 X 13 X 12 Cluster graph Markov grid network Note: clique tree in a n x n grid is exponential in n Round of GBP is O(n)
21
Global Approximate Inference Inference as optimization Generalized Belief Propagation Define algorithm Constructing cluster graphs Analyze approximation guarantees Propagation with approximate messages Factorized messages Approximate message propagation Structured variational approximations
22
Constructing Cluster Graphs When constructing clique trees, all constructions give the same result but differ in computational complexity In GBP, different cluster graphs can vary in both computational complexity and approximation quality
23
Transforming Pairwise MNs A pairwise Markov network over a graph H has: A set of node potentials { [X i ]:i=1,...n} A set of edge potentials { [X i,X j ]: X i,X j H} Example: X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 X 11,X 21 X 11 X 12 X 13 X 23 X 33 X 32 X 31 X 21 X 22 X 12,X 22 X 13,X 23 X 21,X 31 X 22,X 32 X 23,X 33 X 21,X 22 X 11,X 12 X 31,X 32 X 22,X 23 X 12,X 13 X 32,X 33
24
Transforming Bayesian Networks Example: “Large” cluster per each CPD Single nodes for each variable Connect node and large cluster if node in CPD Graph obeys running intersection property A,B,C A DF C B A,B,DB,D,F A B DC F Bethe approximation
25
Global Approximate Inference Inference as optimization Generalized Belief Propagation Define algorithm Constructing cluster graphs Analyze approximation guarantees Propagation with approximate messages Factorized messages Approximate message propagation Structured variational approximations
26
Generalized Belief Propagation GBP maintains distribution invariance (since message passing maintains invariance)
27
Generalized Belief Propagation If GBP converges (K is calibrated) Each subtree T is calibrated with edge potentials corresponding to marginals of P T (U) (since P T (U) is a calibrated tree)
28
Generalized Belief Propagation Calibrated graph potentials are not P F (U) marginals A,B B,C B A,D C,D D C A 1 23 4 A,B B,C B C,D C 1 23
29
Inference as Optimization Optimization task Find Q that maximizes F’[P F,Q] subject to Theorem: fixed points of Q for the above optimization Suggests iterative optimization procedure Identical to belief propagation!
30
GBP as Optimization Optimization task Find Q that maximizes F’[P F,Q] subject to Theorem: fixed points of Q for the above optimization Note: S i,j is only a subset of intersection between C i and C j Iterative optimization procedure is GBP
31
GBP as Optimization Clique trees F[P F,Q]=F’[P F,Q] Iterative procedure (BP) guaranteed to converge Convergence point represents marginal distributions of P F Cluster graphs F[P F,Q]=F’[P F,Q] does not hold! Iterative procedure (GBP) not guaranteed to converge Convergence point does not represent marginal distributions of P F
32
GBP in Practice Dealing with non-convergence Often small portions of the network do not converge stop inference and use current beliefs Use intelligent message passing scheduling Tree reparameterization (TRP) selects entire trees, and calibrates them while keeping all other beliefs fixed Focus attention on uncalibrated regions of the graph
33
Global Approximate Inference Inference as optimization Generalized Belief Propagation Define algorithm Constructing cluster graphs Analyze approximation guarantees Propagation with approximate messages Factorized messages Approximate message propagation Structured variational approximations
34
Propagation w. Approximate Msgs General idea Perform BP (or GBP) as before, but propagate messages that are only approximate Modular approach General inference scheme remains the same Can plug in many different approximate message computations
35
Factorized Messages X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 X 21 X 11 X 12 X 21 X 31 X 32 X 22 X 31 X 11 X 13 X 22 X 33 X 23 X 32 X 12 123 Markov networkClique tree Keep internal structure of the clique tree cliques Calibration involves sending messages that are joint over three variables Idea: simplify messages using factored representation Example:
36
Computational Savings Answering queries in Cluster 2 Exact inference: Exponential in joint space of cluster 2 Approximate inference with factored messages Notice that subnetwork with factored messages is a tree Perform efficient exact inference on subtree to answer queries X 21 X 11 X 12 X 21 X 31 X 32 X 22 X 31 X 11 X 22 X 32 X 12 123
37
Factor Sets A factor set ={ 1,..., k } provides a compact representation for high-dimensional factor 1 ,..., k Belief propagation Multiplication of factor sets Easy: simply the union of the factors in each factor set multiplied Marginalization of factor set: inference in simplified network Example: compute 2 3 X 21 X 11 X 12 X 21 X 31 X 32 X 22 X 31 X 11 X 22 X 32 X 12 123
38
Global Approximate Inference Inference as optimization Generalized Belief Propagation Define algorithm Constructing cluster graphs Analyze approximation guarantees Propagation with approximate messages Factorized messages Approximate message propagation Structured variational approximations
39
Approximate Message Propagation Input Clique tree (or cluster graph) Assignments of original factors to clusters/cliques The factorized form of each cluster/clique Can be represented by a network for each edge C i —C j that specifies the factorization (in previous examples we assumed empty network) Two strategies for approximate message propagation Sum-product message passing scheme Belief update messages
40
Sum-Product Propagation Same propagation scheme as in exact inference Select a root Propagate messages towards the root Each cluster collects messages from its neighbors and sends outgoing messages when possible Propagate messages from the root Each message passing performs inference on cluster Terminates in a fixed number of iterations Note: final marginals at each variable are not exact
41
Message Passing: Belief Propagation Same as BP but with approximate messages Initialize the clique tree For each clique C i set For each edge C i —C j set While unset cliques exist Select C i —C j Send message from C i to C j Marginalize the clique over the sepset Update the belief at C j Update the sepset at C i –C j Approximation Two message passing schemes differ in approximate inference
42
Global Approximate Inference Inference as optimization Generalized Belief Propagation Define algorithm Constructing cluster graphs Analyze approximation guarantees Propagation with approximate messages Factorized messages Approximate message propagation Structured variational approximations
43
Structured Variational Approx. Select a simple family of distributions Q Find Q Q that maximizes F[P F,Q]
44
Mean Field Approximation Q(x) = Q(X i ) Q loses much of the information of P F Approximation is computationally attractive Every query in Q is simple to compute Q is easy to represent X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 P F – Markov grid network X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 Q – Mean field network
45
Mean Field Approximation The energy functional is easy to compute, even for networks where inference is complex
46
Mean Field Maximization Maximizing the Energy Functional of Mean-Field Find Q(x) = Q(X i ) that maximizes F[P F,Q] Subject to for all i: x i Q(x i )=1
47
Mean Field Maximization Theorem: Q(X i ) is a stationary point of the mean field given Q(X 1 ),...Q(X i-1 ),Q(X i+1 ),...Q(X n ) if and only if Proof: To optimize Q(X i ) define the Lagrangian corresponds to the constraint that Q(X i ) is a distribution We now compute the derivative of L i
48
Mean Field Maximization
50
Setting the derivative to zero, and rearranging terms, we get: Taking exponents of both sides we get:
51
Mean Field Maximization: Intuition We can thus rewrite Q(x i ) as:
52
Mean Field Maximization: Intuition Q(x i ) is the geometric average of P(x i |V) Relative to the probability distribution Q In this sense, marginal is “consistent” with other marginals In P F we can also represent marginals Arithmetic average with respect to P F
53
Mean Field: Algorithm Simplify: To: Since terms that do not involve x i can be added to constant Note: Q(x i ) does not appear on right hand side Can solve and reach optimal Q(x i ) in one step Note: step is only optimal given all other Q(X i ) Suggests an iterative algorithm Convergence guaranteed to local maxima since each step improves
54
Markov Network Approximations Can use Q that are increasingly complex As long as Q is easy (=inference feasible) efficient update equations can be derived X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 P F – Markov grid network X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33 Q – Mean field network
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.