Presentation is loading. Please wait.

Presentation is loading. Please wait.

Probabilistic Graphical Models

Similar presentations


Presentation on theme: "Probabilistic Graphical Models"— Presentation transcript:

1 Probabilistic Graphical Models
Variational Inference II: Mean Field and Generalized MF Eric Xing Lecture 17, November 5, 2009 © Eric CMU, Reading:

2 (Forward-backward , Max-product /BP, Junction Tree)
Inference Problems Compute the likelihood of observed data Compute the marginal distribution over a particular subset of nodes Compute the conditional distribution for disjoint subsets A and B Compute a mode of the density Methods we have Message Passing (Forward-backward , Max-product /BP, Junction Tree) Brute-force methods are intractable Recursive message passing is efficient for simple graphical models, like trees Sum-product/max-product Junction Brute force Elimination Individual computations independent Sharing intermediate terms

3 Exponential Family GMs
Canonical Parameterization Effective canonical parameters Regular family: Minimal representation: if there does not exist a nonzero vector such that is a constant Canonical Parameters Sufficient Statistics Log-normalization Function

4 Mean Parameterization
The mean parameter associated with a sufficient statistic is defined as Realizable mean parameter set A convex subset of Convex hull for discrete case Convex polytope when is finite

5 Convex Polytope Convex hull representation
Half-plane based representation Minkowski-Weyl Theorem: any polytope can be characterized by a finite collection of linear inequality constraints

6 Conjugate Duality Duality between MLE and Max-Ent:
For all , a unique canonical parameter satisfying The log-partition function has the variational form For all , the supremum in (*) is attained uniquely at specified by the moment-matching conditions Bijection for minimal exponential family

7 Variational Inference In General
An umbrella term that refers to various mathematical tools for optimization-based formulations of problems, as well as associated techniques for their solution General idea: Express a quantity of interest as the solution of an optimization problem The optimization problem can be relaxed in various ways Approximate the functions to be optimized Approximate the set over which the optimization takes place Goes in parallel with MCMC

8 Bethe Variational Problem (BVP)
We already have: a convex (polyhedral) outer bound the Bethe approximate entropy Combining the two ingredients, we have a simple structured problem (differentiable & constraint set is a simple polytope) Max-product is the solver!

9 Connection to Sum-Product Alg.
Lagrangian method for BVP: Sum-product and Bethe Variational (Yedidia et al., 2002) For any graph G, any fixed point of the sum-product updates specifies a pair of such that For a tree-structured MRF, the solution is unique, where correspond to the exact singleton and pairwise marginal distributions of the MRF, and the optimal value of BVP is equal to

10 Inexactness of Bethe and Sum-Product
From Bethe entropy approximation Example From pseudo-marginal outer bound strict inclusion 3 2 1 4

11 Kikuchi Approximation
Recall: Bethe variational method uses a tree-based (Bethe) approximation to entropy, and a tree-based outer bound on the marginal polytope Kikuchi method extends these tree-based approximations to more general hyper-trees Generalized pseudomarginal set Normalization Marginalization Hyper-tree based approximate entropy Hyper-tree based generalization of BVP

12 Summary So Far Formulate the inference problem as a variational optimization problem The Bethe and Kikuchi free energy are approximations to the negative entropy © Eric CMU,

13 Next Step … We will develop a set of lower-bound methods
© Eric CMU,

14 Tractable Subgraph Given a GM with a graph G, a subgraph F is tractable if We can perform exact inference on it Example: © Eric CMU,

15 Mean Parameterization
For an exponential family GM defined with graph G and sufficient statistics , the realizable mean parameter set For a given tractable subgraph F, a subset of mean parameters is of interest Inner Approximation © Eric CMU,

16 Optimizing a Lower Bound
Any mean parameter yields a lower bound on the log-partition function Moreover, equality holds iff and are dually coupled, i.e., Proof Idea: (Jensen’s Inequality) Optimizing the lower bound gives This is an inference! © Eric CMU,

17 Mean Field Methods In General
However, the lower bound can’t explicitly evaluated in general Because the dual function typically lacks an explicit form Mean Field Methods Approximate the lower bound Approximate the realizable mean parameter set The MF optimization problem Still a lower bound? © Eric CMU,

18 KL-divergence Kullback-Leibler Divergence
For two exponential family distributions with the same STs: Primal Form Mixed Form Dual Form © Eric CMU,

19 Mean Field and KL-divergence
Optimizing a lower bound Equivalent to minimize a KL-divergence Therefore, we are doing minimization © Eric CMU,

20 Naïve Mean Field Fully factorized variational distribution
© Eric CMU,

21 Naïve Mean Field for Ising Model
Sufficient statistics and Mean Parameters Naïve Mean Field Realizable mean parameter subset Entropy Optimization Problem © Eric CMU,

22 Naïve Mean Field for Ising Model
Optimization Problem Update Rule resembles “message” sent from node to forms the “mean field” applied to from its neighborhood © Eric CMU,

23 Non-Convexity of Mean Field
Mean field optimization is always non-convex for any exponential family in which the state space is finite Finite convex hull contains all the extreme points If is a convex set, then Mean field has been used successfully © Eric CMU,

24 Structured Mean Field Mean field theory is general to any tractable sub-graphs Naïve mean field is based on the fully unconnected sub-graph Variants based on structured sub-graphs can be derived © Eric CMU,

25 Other Notations Mean Parameterization Form Distribution Form
where Naïve Mean Field for Ising Model: © Eric CMU,

26 Examples to add GMF for Ising Models Factorial HMM
Bayesian Gaussian Model Latent Dirichlet Allocation © Eric CMU,

27 Summary Message-passing algorithms (e.g., belief propagation, mean field) are solving approximate versions of exact variational principle in exponential families There are two distinct components to approximations: Can use either inner or outer bounds to Various approximation to the entropy function BP: polyhedral outer bound and non-convex Bethe approximation MF: non-convex inner bound and exact form of entropy Kikuchi: tighter polyhedral outer bound and better entropy approximation © Eric CMU,


Download ppt "Probabilistic Graphical Models"

Similar presentations


Ads by Google