Expectation-Maximization & Belief Propagation

Slides:

Advertisements

Similar presentations

Part 2: Unsupervised Learning

Advertisements

Bayesian Belief Propagation

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.

Section 3: Appendix BP as an Optimization Algorithm 1.

Exact Inference in Bayes Nets

Loopy Belief Propagation a summary. What is inference? Given: –Observabled variables Y –Hidden variables X –Some model of P(X,Y) We want to make some.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Overview of Inference Algorithms for Bayesian Networks Wei Sun, PhD Assistant Research Professor SEOR Dept. & C4I Center George Mason University, 2009.

Markov Networks.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

Lecture 5: Learning models using EM

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

Graphical Models Lei Tang. Review of Graphical Models Directed Graph (DAG, Bayesian Network, Belief Network) Typically used to represent causal relationship.

24 November, 2011National Tsin Hua University, Taiwan1 Mathematical Structures of Belief Propagation Algorithms in Probabilistic Information Processing.

Computer vision: models, learning and inference

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Lecture 26: Single-Image Super-Resolution CAP 5415.

Physics Fluctuomatics / Applied Stochastic Process (Tohoku University) 1 Physical Fluctuomatics Applied Stochastic Process 9th Belief propagation Kazuyuki.

Markov Random Fields Probabilistic Models for Images

Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.

Tokyo Institute of Technology, Japan Yu Nishiyama and Sumio Watanabe Theoretical Analysis of Accuracy of Gaussian Belief Propagation.

14 October, 2010LRI Seminar 2010 (Univ. Paris-Sud)1 Statistical performance analysis by loopy belief propagation in probabilistic image processing Kazuyuki.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Physics Fluctuomatics (Tohoku University) 1 Physical Fluctuomatics 7th~10th Belief propagation Kazuyuki Tanaka Graduate School of Information Sciences,

Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.

CS Statistical Machine learning Lecture 24

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Today Graphical Models Representing conditional dependence graphically

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Dynamic Programming & Hidden Markov Models. Alan Yuille Dept. Statistics UCLA.

Distributed cooperation and coordination using the Max-Sum algorithm

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.

30 November, 2005 CIMCA2005, Vienna 1 Statistical Learning Procedure in Loopy Belief Propagation for Probabilistic Image Processing Kazuyuki Tanaka Graduate.

ICPR2004 (24 July, 2004, Cambridge) 1 Probabilistic image processing based on the Q-Ising model by means of the mean- field method and loopy belief propagation.

10 October, 2007 University of Glasgow 1 EM Algorithm with Markov Chain Monte Carlo Method for Bayesian Image Analysis Kazuyuki Tanaka Graduate School.

Learning Deep Generative Models by Ruslan Salakhutdinov

Statistical-Mechanical Approach to Probabilistic Image Processing -- Loopy Belief Propagation and Advanced Mean-Field Method -- Kazuyuki Tanaka and Noriko.

Sublinear Computational Time Modeling in Statistical Machine Learning Theory for Markov Random Fields Kazuyuki Tanaka GSIS, Tohoku University, Sendai,

LECTURE 10: EXPECTATION MAXIMIZATION (EM)

Graduate School of Information Sciences, Tohoku University, Japan

Graduate School of Information Sciences, Tohoku University

Markov Networks.

Bayesian Models in Machine Learning

CSCI 5822 Probabilistic Models of Human and Machine Learning

Graduate School of Information Sciences Tohoku University, Japan

Graduate School of Information Sciences, Tohoku University

Stochastic Optimization Maximization for Latent Variable Models

Graduate School of Information Sciences, Tohoku University

Physical Fluctuomatics 7th~10th Belief propagation

Probabilistic image processing and Bayesian network

Probabilistic image processing and Bayesian network

Graduate School of Information Sciences, Tohoku University

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Markov Networks.

Mean Field and Variational Methods Loopy Belief Propagation

Graduate School of Information Sciences, Tohoku University

Kazuyuki Tanaka Graduate School of Information Sciences

Presentation transcript:

Expectation-Maximization & Belief Propagation Alan Yuille Dept. Statistics UCLA

1. Chair Goal of this Talk. The goal is to introduce the Expectation-Maximization (EM) and Belief Propagation (BP) algorithms. EM is one of the major algorithms used for inference for models where there are hidden/missing/latent variables.

Example: Geman and Geman

Images are piecewise smooth Assume that images are smooth except at sharp discontinuities (edges). Justification from the statistics of real images (Zhu & Mumford).

Graphical Model & Potential The Graphical Model. An undirected graph. Hidden Markov Model. The potential. If the gradient in u becomes too large, then the line process is activated and the smoothness is cut.

The Posterior Distribution We apply Bayes rule to get a posterior distribution:

Line Process: Off and On Illustration of Line Processes. No Edge Edge

Choice of Task. What do we want to estimate?

Expectation Maximization.

Expectation-Maximization

Back to the Geman & Geman model

Image Example

Neural Networks and the Brain An early variant of this algorithm was formulated as a Hopfield network. Koch, Marroquin, Yuille (1987) It is just possible that a variant of this algorithm is implemented in V1 – Prof. Tai Sing Lee (CMU).

EM for Mixture of two Gaussians A mixture model is of form:

EM for a Mixture of two Gaussians Each observation has been generated by one of two Gaussians. But we do not know the parameters (i.e. mean and variance) of the Gaussians and we do not know which Gaussian generated each observation. Colours indicate the assignment of points to clusters (red and blue). Intermediates (e.g. purple) represent probabilistic assignments. The ellipses represent the current parameters values of each cluster.

Expectation-Maximization: Summary We can apply EM to any inference problem with hidden variables. The following limitations apply: (1) Can we perform the E and M steps? For the image problem, the E step was analytic and the M step required solving linear equations. (2) Does the algorithm converge to the global maximum of P(u|d)? This is true for some problems, but not for all.

Expectation Maximization: Summary For an important class of problems – EM has a nice symbiotic relationship with dynamic programming (see next lecture). Mathematically, the EM algorithm falls into a class of optimization techniques known as Majorization (Statistics) and Variational Bounding (Machine Learning). Majorization (De Leeuw) is considerably older…

Belief Propagation (BP) and Message Passing BP is an inference algorithm that is exact for graphical models defined on trees. It is similar to dynamic programming (see next lecture). It is often known as “loopy BP” when applied to graphs with closed loops. Empirically, it is often a successful approximate algorithm for graphs with closed loops. But it tends to degrade badly when the number of closed loops increases.

BP and Message Parsing We define a distribution (undirected graph) BP comes in two forms: (I) sum-product, and (II) max-product. Sum product (Pearl) is used for estimating the marginal distributions of the variables x.

Message Passing: Sum Product Sum-product proceeds by passing messages between nodes.

Message Parsing: Max Product The max-product algorithm (Gallager) also uses messages but it replaces the sum by a max. The update rule is:

Beliefs and Messages We construct “beliefs” – estimates of the marginal probabilities – from the messages: For graphical models defined on trees (i.e.no closed loops): (i) sum-product will converge to the marginals of the distribution P(x). (ii) max-product converges to the maximum probability states of P(x). But this is not very special, because other algorithms do this – see next lecture.

Loopy BP The major interest in BP is that it performs well empirically when applied to graphs with closed loops. But: (i) convergence is not guaranteed (the algorithm can oscillate) (ii) the resulting beliefs are only approximations to the correct marginals.

Bethe Free Energy There is one major theoretical result (Yedidia et al). The fixed points of BP correspond to extrema of the Bethe free energy. The Bethe free energy is one of a set of approximations to the free energy.

BP without messages. Use the beliefs to construct local approximations B(.) to the distribution. Update beliefs by repeated marginalization

BP without messages Local approximations (consistent on trees).

Another Viewpoint of BP There is also a relationship between BP and Markov Chain Monte Carlo (MCMC). BP is like a deterministic form of the Gibbs sampler. MCMC will be described in later lectures.

Summary of BP BP gives exact results on trees (similar to dynamic programming). BP gives surprisingly good approximate results on graphs with loops. No guarantees of convergence, but fixed points of BP correspond to extrema of the Bethe Free energy. BP can be formulated without messages. BP is like a deterministic version of the Gibbs sampler in MCMC.