Belief Propagation and its Generalizations Shane Oldenburger.

Slides:

Advertisements

Similar presentations

Bayesian Belief Propagation

Advertisements

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Section 3: Appendix BP as an Optimization Algorithm 1.

Introduction to Markov Random Fields and Graph Cuts Simon Prince

Exact Inference in Bayes Nets

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

An Introduction to Variational Methods for Graphical Models.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.

Pearl’s Belief Propagation Algorithm Exact answers from tree-structured Bayesian networks Heavily based on slides by: Tomas Singliar,

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.

Junction Trees: Motivation Standard algorithms (e.g., variable elimination) are inefficient if the undirected graph underlying the Bayes Net contains cycles.

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

Bayesian network inference

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Global Approximate Inference Eran Segal Weizmann Institute.

Conditional Random Fields

Belief Propagation, Junction Trees, and Factor Graphs

Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.

Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.

Bayesian Networks Alan Ritter.

A Differential Approach to Inference in Bayesian Networks - Adnan Darwiche Jiangbo Dang and Yimin Huang CSCE582 Bayesian Networks and Decision Graphs.

A Trainable Graph Combination Scheme for Belief Propagation Kai Ju Liu New York University.

Genome Evolution. Amos Tanay 2009 Genome evolution: Lecture 8: Belief propagation.

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.

Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.

1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.

Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation Kazuyuki.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Kazuyuki Tanaka Graduate School of Information Sciences,

1 CMSC 671 Fall 2001 Class #21 – Tuesday, November 13.

CS Statistical Machine learning Lecture 24

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Lecture 2: Statistical learning primer for biologists

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Graduate School of Information Sciences, Tohoku University

Introduction on Graphic Models

Today Graphical Models Representing conditional dependence graphically

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Bayesian Belief Propagation for Image Understanding David Rosenberg.

Free Energy Estimates of All-atom Protein Structures Using Generalized Belief Propagation Kamisetty H., Xing, E.P. and Langmead C.J. Raluca Gordan February.

CSCI 5822 Probabilistic Models of Human and Machine Learning

Graduate School of Information Sciences Tohoku University, Japan

Generalized Belief Propagation

Markov Random Fields Presented by: Vladan Radosavljevic.

Graduate School of Information Sciences, Tohoku University

Physical Fluctuomatics 7th~10th Belief propagation

Expectation-Maximization & Belief Propagation

Class #16 – Tuesday, October 26

Probabilistic image processing and Bayesian network

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Approximate Inference by Sampling

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Junction Trees 3 Undirected Graphical Models

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Mean Field and Variational Methods Loopy Belief Propagation

Generalized Belief Propagation

Presentation transcript:

Belief Propagation and its Generalizations Shane Oldenburger

Outline The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP

Outline The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP

Recall from the Jointree Algorithm We separate evidence e into: e + : denotes evidence pertaining to ancestors e - : denotes evidence pertaining to descendants BEL(X) = P(X|e) = P(X|e +,e - ) = P(e - |X,e + )*P(X|e + )/P(e - |e + ) =  P(e - |X)P(X|e + ) =  * (X)*  (X)  : messages from parents : messages from children  : normalization constant

Pearl’s Belief Propagation Algorithm: Initialization Nodes with evidence (x i ) = 1 where x i = e i ; 0 otherwise  (x i ) = 1 where x i = e i ; 0 otherwise Nodes with no parents  (x i ) = p(x i )//prior probabilities Nodes with no children (x i ) = 1

Pearl’s BP algorithm Iterate For each X: If all  messages from parents of X have arrived, combine into  (X) If all messages from children of X have arrived, combine into (X) If  (X) has been computed and all messages other than from Y i have arrived, calculate and send message  XYi to child Y i If (X) has been computed and all  messages other than from U i have arrived, calculate and send message XUi to parent U i Compute BEL(X) =  * (X)*  (X)

Example of data propagation in a simple tree

BP properties Exact for Polytrees Only one path between any two nodes Each node X separates graph into two disjoint graphs (e +, e - ) But most graphs of interest are not Polytrees – what do we do? Exact inference Cutset conditioning Jointree method Approximate inference Loopy BP

In the simple tree example, a finite number of messages where passed In a graph with loops, messages may be passed around indefinitely Stop when beliefs converge Stop after some number of iterations Loopy BP tends to achieve good empirical results Low-level computer vision problems Error-correcting codes: Turbocodes, Gallager codes

Outline The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP

Markov Random Fields BP algorithms have been developed for many graphical models Pairwise Markov Random Fields are used in this paper for ease of presentation An MRF consists of “observable” nodes and “hidden” nodes Since it is pairwise, each observable node is connected to exactly one hidden node, and each hidden is connected to at most one observable node

Markov Random Fields Two hidden variables x i and x j are connected by a “compatibility function”  ij (x i, y i ) Hidden variable xi is connected to observable variable yi by “evidence function”  i (x i, y i ) =  xi (x i ) The joint probability for a pairwise MRF is p({x}) = (1/Z)  ij  ij (x i, y i )  i  xi (x i ) The BP algorithm for pairwise MRFs is similar to that for Bayesian Networks

Conversion between graphical models We can limit ourselves to considering pairwise MRFs Any pairwise MRF or BN can be converted to an equivalent “Factor graph” Any factor graph can be converted into an equivalent pairwise MRF or BN

An intermediary model A factor graph is composed of “variable” nodes represented by circles “function” nodes represented by squares Factor graphs are a generalization of Tanner graphs, where the “function” nodes are parity checks of its connected variables A function node for a factor graph can be any arbitrary function of the variables connected to it

From pairwise MRF to BN

From BN to pairwise MRF

Outline The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP

Gibbs Free Energy Gibbs free energy is the difference in the energy of a system from an initial state to a final state of some process (e.g. chemical reaction) For a chemical reaction, if the Gibbs free energy is negative then the reaction is “spontaneous”, or “allowed” If the Gibbs free energy is non-negative, the reaction is “not allowed”

Gibbs free energy Instead of difference in energy of a chemical process, we want to define Gibbs free energy in term of the difference between a target probability distribution p and an approximate probability distribution b Define the “distance” between p({x}) and b({x}) as D(b({x}) || p({x})) =  {x} b({x}) ln[b({x})/ p({x})] This is known as the Kullback-Liebler distance Boltzmann’s law: p({x}) = (1/Z) e -E({x})/T Generally assumed by statistical physicists Here we will use Boltzmann’s law as our definition of “energy” E T acts as a unit scale parameter; let T = 1 Substituting Boltzmann’s law into our distance measure D(b({x}) || p({x})) =  {x} b({x})E({x}) +  {x} b({x})ln[b({x})] + ln Z

Gibbs free energy Our distance measure D(b({x}) || p({x})) =  {x} b({x})E({x}) +  {x} b({x})ln[b({x})] + ln Z We see will be zero (p = b) when G(b({x})) =  {x} b({x})E({x}) +  {x} b({x})ln[b({x})] = U(b({x}) - S(b({x}) is minimized at F = -ln Z G: “Gibbs free energy” F: “Helmholz free energy” U: “average energy” S: “entropy”

Outline The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP

Bethe approximation We would like to derive Gibbs free energy in terms of one- and two-node beliefs b i and b ij Due to the pairwise nature of pairwise MRFs, b i and b ij are sufficient to compute the average energy U U = -  ij b ij (x i,x j )ln  ij (x i,x j ) -  i b i (x i )ln  i (x i ) The exact marginals probabilities p i and p ij yeild the same form, so this average energy is exact if the one- and two-node beliefs are exact

Bethe approximation The entropy term is more problematic Usually must settle for an approximation Entropy can be computed exactly if it can be explicitly expressed in terms of one- and two-node beliefs B({x}) =  ij b ij (x i,x j ) /  i b i (x i ) qi-1 where q i = #neighbors of x i Then the Bethe approximation to entropy is S Bethe =  ij  xixj b ij (x i,x j )lnb ij (x i,x j ) +  (q i -1)  xi b i (x i )lnb i (x i ) For singly connected networks, this is exact and G Bethe = U – S Bethe corresponds to the exact marginal probabilities p For graphs with loops, this is only an approximation (but usually a good one)

Equivalence of BP and Bethe The Bethe approximation is exact for pairwise MRF’s when the graphs contain no loops, so the Bethe free energy is minimal for the correct marginals BP gives correct marginals when the graph contains no loops Thus, when there are no loops, the BP beliefs are the global minima of the Bethe free energy We can say more: a set of beliefs gives a BP fixed point in any graph iff they are local stationary points of the Bethe free energy This can be shown by adding Lagrange multipliers to GBethe to enforce the marginalization constraints

Outline The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP

Kikuchi approximation Kikuchi approximation is an improvement on and generalization of Bethe With this association between BP and the Bethe approximation to Gibbs free energy, can we use better approximation methods to craft better BP algorithms?

Cluster variational method Free energy approximated as a sum of local free energies of sets of regions of nodes “Cluster variational method” provides a way to select the set of regions Begin with a basic set of clusters including every interaction and node Subtract the free energies of over- counted intersection regions Add back over-counted intersections of intersections, etc. Bethe is a Kikuchi approximation where the basic clusters are set to the set of all pairs of hidden nodes

Cluster variational method Bethe regions involve one or two nodes Define local free energy of a single node G i (b i (x i )) =  xi b i (x i )*ln(b i (x i ) + E i (x i )) Define local free energy involving two nodes G ij (b i (x i,xj)=  xi,xj b ij (x i,x j )*ln(b ij (x i,x j ) + E ij (x i,x j )) Then for the regions corresponding to Bethe, G Bethe = G 12 + G 23 + G 45 + G 56 + G 14 + G 25 + G 36 – G 1 – G 3 – G 4 – G 6 – 2G 2 – 2G 5

Cluster variational method For the Kikuchi example shown below, regions involve four nodes Extend the same logic as before Define local free energy involving four nodes e.g. G 1245 (b 1245 (x 1,x 2,x 4,x 5 ) =  x1,x2,x4,x5 b 1245 (x 1,x 2,x 4,x 5 )* ln(b 1245 (x 1,x 2,x 4,x 5 ) + E 1245 (x 1,x 2,x 4,x 5 )) Then for the Kikuchi regions shown, G Kikuchi = G G 2356 – G 25

A more general example Now we have basic regions [1245], [2356], [4578], [5689] Intersection regions [25], [45], [56], [58], and Intersection of intersection region [5] Then we have G Kikuchi = G G G G G 25 - G 45 - G 56 - G 58 + G 5

Outline The BP algorithm MRFs – Markov Random Fields Gibbs free energy Bethe approximation Kikuchi approximation Generalized BP

We show how to construct a GBP algorithm for this example First find the intersections, intersections of intersections, etc. of the basic clusters Basic: [1245], [2356], [4578], [5689] Intersections: [25], [45], [56], [58] Intersection of intersections: [5]

Region Graph Next, organize regions into the region graph A hierarchy of regions and their “direct” subregions ” direct” subregions are subregions not contained in another subregion e.g. [5] is a subregion of [1245], but is also a subregion of [25]

Messages Construct messages from all regions r to direct subregions s These correspond to each edge of the region graph Consider the message from region [1245] to subregion [25] A message from nodes not in the subregion (1,4) to those in the subregion (2,5)  m 14  25

Belief Equations Construct belief equations for every region r b r ({x} r ) proportional to each compatibility matrix and evidence term completely contained in r b 5 = k[  5 ][m 2  5 m 4  5 m 6  5 m 8  5 ] b 45 = k[  4  5  45 ][m 12  45 m 78  45 m 2  5 m 6  5 m 8  5 ] b 1245 = k[  1  2  4  5  12  14  25  45 ] [m 36  25 m 78  45 m 6  5 m 8  5 ]

Belief Equations b 5 = k[  5 ][m 2  5 m 4  5 m 6  5 m 8  5 ]

Belief Equations b 45 = k[  4  5  45 ][m 12  45 m 78  45 m 2  5 m 6  5 m 8  5 ]

Belief Equations b 1245 =k[  1  2  4  5  12  14  25  45 ][m 36  25 m 78  45 m 6  5 m 8  5 ]

Enforcing Marginalization Now, we need to enforce the marginalization condition relating each pair of regions that share an edge in the hierarchy e.g. between [5] and [45] b 5 (x 5 ) =  x4 b 45 (x 4, x 5 )

Message Update Adding the marginalization into the belief equations, we get the message update rule: m 4  5 (x 5 )  k  x4,x2  4 (x 4 )  45 (x 4,x 5 )m 12  45 (x 4,x 5 )m 78  25 (x 2,x 5 ) The collection of belief equations and the message update rules define out GBP algorithm

Complexity of GBP Bad news: running time grows exponentially with the size of the basic clusters chosen Good news: if the basic clusters encompass the shortest loops in the graphical model, usually nearly all the error from BP is eliminated This usually requires only a small addition amount of computation than BP