Probabilistic Graphical Models

Slides:

Advertisements

Similar presentations

Mean-Field Theory and Its Applications In Computer Vision1 1.

Advertisements

Bayesian Belief Propagation

CS188: Computational Models of Human Behavior

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.

Expectation Maximization

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Dynamic Bayesian Networks (DBNs)

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

An Introduction to Variational Methods for Graphical Models.

Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.

Variational Inference in Bayesian Submodular Models

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

CS774. Markov Random Field : Theory and Application Lecture 06 Kyomin Jung KAIST Sep

Visual Recognition Tutorial

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Approximation Algorithms

Conditional Random Fields

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

Genome evolution: a sequence-centric approach Lecture 6: Belief propagation.

Visual Recognition Tutorial

Genome Evolution. Amos Tanay 2009 Genome evolution: Lecture 8: Belief propagation.

. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.

Some Surprises in the Theory of Generalized Belief Propagation Jonathan Yedidia Mitsubishi Electric Research Labs (MERL) Collaborators: Bill Freeman (MIT)

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Appendix Kazuyuki Tanaka Graduate School of Information.

CSC2535: Computation in Neural Networks Lecture 11: Conditional Random Fields Geoffrey Hinton.

1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.

Approximation Algorithms Department of Mathematics and Computer Science Drexel University.

Lecture 19: More EM Machine Learning April 15, 2010.

Markov Random Fields Probabilistic Models for Images

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Kazuyuki Tanaka Graduate School of Information Sciences,

CS Statistical Machine learning Lecture 24

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

Graduate School of Information Sciences, Tohoku University

Eric Xing © Eric CMU, Machine Learning Algorithms and Theory of Approximate Inference Eric Xing Lecture 15, August 15, 2010 Reading:

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

Pattern Recognition and Machine Learning

Mean field approximation for CRF inference

G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Distributed cooperation and coordination using the Max-Sum algorithm

Linear Programming Chap 2. The Geometry of LP  In the text, polyhedron is defined as P = { x  R n : Ax  b }. So some of our earlier results should.

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

Generalized Iterative Scaling Exponential Family Distributions

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Bucket Renormalization for Approximate Inference

Chap 9. General LP problems: Duality and Infeasibility

Nuclear Norm Heuristic for Rank Minimization

Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.

Polyhedron Here, we derive a representation of polyhedron and see the properties of the generators. We also see how to identify the generators. The results.

Generalized Belief Propagation

Physical Fluctuomatics 7th~10th Belief propagation

Algorithms and Theory of

Expectation-Maximization & Belief Propagation

Quantum Foundations Lecture 3

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

(Convex) Cones Def: closed under nonnegative linear combinations, i.e.

Mean Field and Variational Methods Loopy Belief Propagation

Presentation transcript:

Probabilistic Graphical Models Variational Inference II: Mean Field and Generalized MF Eric Xing Lecture 17, November 5, 2009 © Eric Xing @ CMU, 2005-2009 Reading:

(Forward-backward , Max-product /BP, Junction Tree) Inference Problems Compute the likelihood of observed data Compute the marginal distribution over a particular subset of nodes Compute the conditional distribution for disjoint subsets A and B Compute a mode of the density Methods we have Message Passing (Forward-backward , Max-product /BP, Junction Tree) Brute-force methods are intractable Recursive message passing is efficient for simple graphical models, like trees Sum-product/max-product Junction Brute force Elimination Individual computations independent Sharing intermediate terms

Exponential Family GMs Canonical Parameterization Effective canonical parameters Regular family: Minimal representation: if there does not exist a nonzero vector such that is a constant Canonical Parameters Sufficient Statistics Log-normalization Function

Mean Parameterization The mean parameter associated with a sufficient statistic is defined as Realizable mean parameter set A convex subset of Convex hull for discrete case Convex polytope when is finite

Convex Polytope Convex hull representation Half-plane based representation Minkowski-Weyl Theorem: any polytope can be characterized by a finite collection of linear inequality constraints

Conjugate Duality Duality between MLE and Max-Ent: For all , a unique canonical parameter satisfying The log-partition function has the variational form For all , the supremum in (*) is attained uniquely at specified by the moment-matching conditions Bijection for minimal exponential family

Variational Inference In General An umbrella term that refers to various mathematical tools for optimization-based formulations of problems, as well as associated techniques for their solution General idea: Express a quantity of interest as the solution of an optimization problem The optimization problem can be relaxed in various ways Approximate the functions to be optimized Approximate the set over which the optimization takes place Goes in parallel with MCMC

Bethe Variational Problem (BVP) We already have: a convex (polyhedral) outer bound the Bethe approximate entropy Combining the two ingredients, we have a simple structured problem (differentiable & constraint set is a simple polytope) Max-product is the solver!

Connection to Sum-Product Alg. Lagrangian method for BVP: Sum-product and Bethe Variational (Yedidia et al., 2002) For any graph G, any fixed point of the sum-product updates specifies a pair of such that For a tree-structured MRF, the solution is unique, where correspond to the exact singleton and pairwise marginal distributions of the MRF, and the optimal value of BVP is equal to

Inexactness of Bethe and Sum-Product From Bethe entropy approximation Example From pseudo-marginal outer bound strict inclusion 3 2 1 4

Kikuchi Approximation Recall: Bethe variational method uses a tree-based (Bethe) approximation to entropy, and a tree-based outer bound on the marginal polytope Kikuchi method extends these tree-based approximations to more general hyper-trees Generalized pseudomarginal set Normalization Marginalization Hyper-tree based approximate entropy Hyper-tree based generalization of BVP

Summary So Far Formulate the inference problem as a variational optimization problem The Bethe and Kikuchi free energy are approximations to the negative entropy © Eric Xing @ CMU, 2005-2009

Next Step … We will develop a set of lower-bound methods © Eric Xing @ CMU, 2005-2009

Tractable Subgraph Given a GM with a graph G, a subgraph F is tractable if We can perform exact inference on it Example: © Eric Xing @ CMU, 2005-2009

Mean Parameterization For an exponential family GM defined with graph G and sufficient statistics , the realizable mean parameter set For a given tractable subgraph F, a subset of mean parameters is of interest Inner Approximation © Eric Xing @ CMU, 2005-2009

Optimizing a Lower Bound Any mean parameter yields a lower bound on the log-partition function Moreover, equality holds iff and are dually coupled, i.e., Proof Idea: (Jensen’s Inequality) Optimizing the lower bound gives This is an inference! © Eric Xing @ CMU, 2005-2009

Mean Field Methods In General However, the lower bound can’t explicitly evaluated in general Because the dual function typically lacks an explicit form Mean Field Methods Approximate the lower bound Approximate the realizable mean parameter set The MF optimization problem Still a lower bound? © Eric Xing @ CMU, 2005-2009

KL-divergence Kullback-Leibler Divergence For two exponential family distributions with the same STs: Primal Form Mixed Form Dual Form © Eric Xing @ CMU, 2005-2009

Mean Field and KL-divergence Optimizing a lower bound Equivalent to minimize a KL-divergence Therefore, we are doing minimization © Eric Xing @ CMU, 2005-2009

Naïve Mean Field Fully factorized variational distribution © Eric Xing @ CMU, 2005-2009

Naïve Mean Field for Ising Model Sufficient statistics and Mean Parameters Naïve Mean Field Realizable mean parameter subset Entropy Optimization Problem © Eric Xing @ CMU, 2005-2009

Naïve Mean Field for Ising Model Optimization Problem Update Rule resembles “message” sent from node to forms the “mean field” applied to from its neighborhood © Eric Xing @ CMU, 2005-2009

Non-Convexity of Mean Field Mean field optimization is always non-convex for any exponential family in which the state space is finite Finite convex hull contains all the extreme points If is a convex set, then Mean field has been used successfully © Eric Xing @ CMU, 2005-2009

Structured Mean Field Mean field theory is general to any tractable sub-graphs Naïve mean field is based on the fully unconnected sub-graph Variants based on structured sub-graphs can be derived © Eric Xing @ CMU, 2005-2009

Other Notations Mean Parameterization Form Distribution Form where Naïve Mean Field for Ising Model: © Eric Xing @ CMU, 2005-2009

Examples to add GMF for Ising Models Factorial HMM Bayesian Gaussian Model Latent Dirichlet Allocation © Eric Xing @ CMU, 2005-2009

Summary Message-passing algorithms (e.g., belief propagation, mean field) are solving approximate versions of exact variational principle in exponential families There are two distinct components to approximations: Can use either inner or outer bounds to Various approximation to the entropy function BP: polyhedral outer bound and non-convex Bethe approximation MF: non-convex inner bound and exact form of entropy Kikuchi: tighter polyhedral outer bound and better entropy approximation © Eric Xing @ CMU, 2005-2009