Presentation on theme: "Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some."— Presentation transcript:
What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some analysis of P(X|Y): –Estimate marginal P(S) for S µ X –Minimal Mean Squared Error configuration (MMSE) This is just E[X|Y] –Maximum A-Posteriori configuration (MAP) –N most likely configurations –Minimum Variance (MVUE)
Representing Structure in P(X,Y) Often, P(X,Y) = k k (X Ck ), where X Ck µ X [ Y Markov Random FieldBayes NetFactor Graph P(X) = f 1 (x 1,x 2,x 3 ) ¢ f 2 (x 3,x 4 ) ¢ f 3 (x 3,x 5 ) / Z P(X) = P(x 3 |x 1,x 2 ) ¢ P(x 4 |x 3 ) ¢ P(x 5 |x 3 ) P(X) = f 1 (x 1,x 2,x 3 ) ¢ f 2 (x 3,x 4 ) ¢ f 3 (x 3,x 5 ) ¢ f 4 (x 1 ) ¢ f 5 (x 2 ) / Z
Sum-Product Algorithm aka belief update Suppose the factor graph is a tree. For the tree to the left, we have: P(X) = f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) Then marginalization (for example, computing P(x 1 )) can be sped up by exploiting the factorization: P(x 1 ) = f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) = f 1 (x 1,x 2 ) ( f 3 (x 3,x 5 )) ( f 4 (x 4,x 6 )) x 2,x 3,x 4,x 5,x 6 x 2,x 3,x 4 x5x5 x6x6 Quickly computes every single-variable marginal P(x n ) from a tree graph
Message Passing for Sum-Product We can compute every marginal P(x n ) quickly using a system of message passing: Message from variable node n to factor node m: v n,m (x n ) = i,n (x n ) Message from factor node m to variable node n: m,n (x n ) = [f s (x N(s) ) v i,m (x i )] Marginal P(x n ): P(x n ) / m,n (x n ) Each node n can pass a message to neighbor m only once it has received a message from all other adjacent nodes. Intuitively, each message from n to m represents P(x m |S n ), where S n is the set of all children of node n. i 2 N(n) \ n x N(n) \ n m 2 N(n)
Max-Product Algorithm aka belief revision Instead of summing P(X), we take the maximum to get the “maximal” (instead of the marginal): M(x 1 ) = max f 1 (x 1,x 2 )f 2 (x 2,x 3,x 4 )f 3 (x 3,x 5 )f 4 (x 4,x 6 ) = max f 1 (x 1,x 2 ) (max f 3 (x 3,x 5 )) (max f 4 (x 4,x 6 )) Use the same message passing system to compute the maximal of each variable. x 2,x 3,x 4,x 5,x 6 x 2,x 3,x 4 x5x5 x6x6 Quickly computes the Maximum A-Posteriori configuration of a tree graph
Computational Cost of Max-Product and Sum-Product Each message is of size M, where M is the number of states in the random variable. –usually pretty small Each variable ! factor node message requires (N-2)M multiplies, where N is the number of neighbors off the variable node. –that’s tiny Each factor ! variable node message requires summation over N-1 variables, each of size M. Total computation per message is O(N ¢ M N ). –not bad, as long as there aren’t any hub-like nodes.
What if the graph is not a tree Several alternative methods: –Gibbs sampling –Expectation Maximization –Variational methods –Elimination Algorithm –Junction-Tree algorithm –Loopy Belief Propagation
Loopy Belief Propagation Just apply BP rules in spite of loops In each iteration, each node sends all messages in parallel Seems to work for some applications Decoding TurboCodes
Trouble with LBP May not converge –A variety of tricks can help Cycling Error – old information is mistaken as new Convergence Error – unlike in a tree, neighbors need not be independent. However, LBP treats them as if they were. Bolt & Gaag “On the convergence error in loopy propagation” (2004).
Good news about MAP in LBP For a single loop, MAP values are correct –Although the “maximals” are not If LPB converges, the resulting MAP configuration has higher probability than any other configuration in the “Single Loops and Trees” Neighborhood Example SLT neighborhoods on a grid Weiss, Freeman, “On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs” (2001)
MMSE in LBP If P(X) is jointly Gaussian, LBP will converge to the correct marginals. For pairwise-connected markov random fields, if LBP converges, its marginals will minimize Bethe free energy. Weiss, Freeman, “Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology” (2001) Yedidia, Freeman, Weiss, “Bethe free energy, Kikuchi approximations, and belief propagation algorithms”, (2001)
Free Energy Suppose we were able to compute the marginals of a probability distribution b(X) that closely approximated P(X|Y). We would want b(X) to resemble P(X|Y) as much as possible. The total energy F of b(X) is the Kullback-Leibler divergence between b(X) and P(X|Y): However, F is difficult to compute. Also, the b(X) we are working with is often ill-defined.
Kikuchi Free Energy We can approximate total free energy using Kikuchi Free energy. 1)Select a set of clusters of nodes of a factor graph 1)All nodes must be in at least one cluster 2)For each factor node in a cluster, all adjacent variables nodes must also be included. 2)For each cluster of variables S i, compute the total energy. Sum them together. a)F[b(S i )] is the KL-divergence between b(S_i) and the marginal P(S_i|Y) 3)Now we have double-counted the intersections between sets S_i. Subtract the free-energy of the intersections. Repeat. Bethe free energy is Kukuchi free energy starting with all clusters of size 2.
More advanced algorithms Greater accuracy, at a price Generalized Belief Propagation algorithms have been developed to minimize Kicuchi free energy (Yedida, Freeman, Weiss, 2004) –The junction-tree algorithm is a special case Alan Yuille (2000) has devised a message passing algorithm that minimizes Bethe free energy and is guaranteed to converge. Other groups are working on fast & robust Bethe minimization (Pretti & Pelizzola 2003).