Extending Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka.

Slides:

Advertisements

Similar presentations

Expectation Propagation in Practice

Advertisements

Bayesian Belief Propagation

Exact Inference in Bayes Nets

Dynamic Bayesian Networks (DBNs)

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Markov Networks.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

Guillaume Bouchard Xerox Research Centre Europe

A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.

… Hidden Markov Models Markov assumption: Transition model:

Belief Propagation in a Continuous World Andrew Frank 11/02/2009 Joint work with Alex Ihler and Padhraic Smyth TexPoint fonts used in EMF. Read the TexPoint.

1 Graphical Models in Data Assimilation Problems Alexander Ihler UC Irvine Collaborators: Sergey Kirshner Andrew Robertson Padhraic Smyth.

1 Distributed localization of networked cameras Stanislav Funiak Carlos Guestrin Carnegie Mellon University Mark Paskin Stanford University Rahul Sukthankar.

Herding: The Nonlinear Dynamics of Learning Max Welling SCIVI LAB - UCIrvine.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

Global Approximate Inference Eran Segal Weizmann Institute.

Lecture 5: Learning models using EM

Conditional Random Fields

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.

Bayesian Learning Rong Jin.

Computer vision: models, learning and inference Chapter 10 Graphical Models.

Learning Programs Danielle and Joseph Bennett (and Lorelei) 4 December 2007.

Bayesian Learning for Conditional Models Alan Qi MIT CSAIL September, 2005 Joint work with T. Minka, Z. Ghahramani, M. Szummer, and R. W. Picard.

A Trainable Graph Combination Scheme for Belief Propagation Kai Ju Liu New York University.

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

Extending Expectation Propagation on Graphical Models Yuan (Alan) Qi

1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.

High-resolution computational models of genome binding events Yuan (Alan) Qi Joint work with Gifford and Young labs Dana-Farber Cancer Institute Jan 2007.

Direct Message Passing for Hybrid Bayesian Networks Wei Sun, PhD Assistant Research Professor SFL, C4I Center, SEOR Dept. George Mason University, 2009.

Markov Random Fields Probabilistic Models for Images

UIUC CS 498: Section EA Lecture #21 Reasoning in Artificial Intelligence Professor: Eyal Amir Fall Semester 2011 (Some slides from Kevin Murphy (UBC))

Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.

Randomized Algorithms for Bayesian Hierarchical Clustering

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.

CS Statistical Machine learning Lecture 24

Extending Expectation Propagation on Graphical Models Yuan (Alan) Qi MIT Media Lab.

Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

Lecture 2: Statistical learning primer for biologists

Belief Propagation and its Generalizations Shane Oldenburger.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Pattern Recognition and Machine Learning

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

1 Relational Factor Graphs Lin Liao Joint work with Dieter Fox.

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

Samuel Cheng, Shuang Wang and Lijuan Cui University of Oklahoma

Bayesian Belief Propagation for Image Understanding David Rosenberg.

Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.

Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka.

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

CS498-EA Reasoning in AI Lecture #23 Instructor: Eyal Amir Fall Semester 2011.

Extending Expectation Propagation for Graphical Models

Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani

CSCI 5822 Probabilistic Models of Human and Machine Learning

Bucket Renormalization for Approximate Inference

≠ Particle-based Variational Inference for Continuous Systems

Graduate School of Information Sciences, Tohoku University

Expectation-Maximization & Belief Propagation

Extending Expectation Propagation for Graphical Models

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Markov Networks.

Mean Field and Variational Methods Loopy Belief Propagation

Presentation transcript:

Extending Expectation Propagation for Graphical Models Yuan (Alan) Qi Joint work with Tom Minka

Motivation Graphical models are widely used in real- world applications, such as wireless communications and bioinformatics. Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency. Need a method that better balances the trade-off between accuracy and efficiency.

Motivation Computational Time Error Current Techniques What we want

Outline Background on expectation propagation (EP) Extending EP on Bayesian networks for dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

Outline Background on expectation propagation (EP) Extending EP on Bayesian networks for dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

Graphical Models Directed ( Bayesian networks) Undirected ( Markov networks) x1x1 x2x2 y1y1 y2y2 x1x1 x2x2 y1y1 y2y2

Inference on Graphical Models Bayesian inference techniques: –Belief propagation (BP): Kalman filtering /smoothing, forward-backward algorithm –Monte Carlo: Particle filter/smoothers, MCMC Loopy BP: typically efficient, but not accurate on general loopy graphs Monte Carlo: accurate, but often not efficient

Expectation Propagation in a Nutshell Approximate a probability distribution by simpler parametric terms: For directed graphs: For undirected graphs: Each approximation term lives in an exponential family (e.g. Gaussian)

EP in a Nutshell The approximate term minimizes the following KL divergence by moment matching: Where the leave-one-out approximation is

Limitations of Plain EP Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence. Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization. –Tree-structured q(x):

Three Extensions 1. Instead of choosing the approximate term to minimize the following KL divergence: use other criteria. 2. Use numerical approximation to compute moments: Quadrature or Monte Carlo. 3. Allow the tree-structured q(x) to be non-coherent during the iterations. It only needs to be coherent in the end.

Efficiency vs. Accuracy Computational Time Error Extended EP ? Monte Carlo Loopy BP (Factorized EP)

Outline Background on expectation propagation (EP) Extending EP on Bayesian networks for dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

Object Tracking Guess the position of an object given noisy observations Object

Bayesian Network (random walk) e.g. want distribution of x’s given y’s x1x1 x2x2 xTxT y1y1 y2y2 yTyT

Approximation Factorized and Gaussian in x

Message Interpretation = (forward msg)(observation msg)(backward msg) xtxt ytyt Forward Message Backward Message Observation Message

Extensions of EP Instead of matching moments, use any method for approximate filtering. –Examples: statistical linearization, unscented Kalman filter (UKF), mixture of Kalman filters Turn any deterministic filtering method into a smoothing method! All methods can be interpreted as finding linear/Gaussian approximations to original terms. Use quadrature or Monte Carlo for term approximations

Example: Poisson Tracking is an integer valued Poisson variate with mean

Poisson Tracking Model

Extended EP vs. Monte Carlo: Accuracy Variance Mean

Accuracy/Efficiency Tradeoff

Bayesian network for Wireless Signal Detection x1x1 x2x2 xTxT y1y1 y2y2 yTyT s1s1 s2s2 sTsT s i : Transmitted signals x i : Channel coefficients for digital wireless communications y i : Received noisy observations

Extended-EP Joint Signal Detection and Channel Estimation Turn mixture of Kalman filters into a smoothing method Smoothing over the last observations Observations before act as prior for the current estimation

Computational Complexity Expectation propagation O(nLd 2 ) Stochastic mixture of Kalman filters O(LMd 2 ) Rao-blackwellised particle smoothers O(LMNd 2 ) n: Number of EP iterations (Typically, 4 or 5) d: Dimension of the parameter vector L: Smooth window length M: Number of samples in filtering (Often larger than 500) N: Number of samples in smoothing (Larger than 50) EP is about 5,000 times faster than Rao-blackwellised particle smoothers.

Experimental Results EP outperforms particle smoothers in efficiency with comparable accuracy. (Chen, Wang, Liu 2000) Signal-Noise-Ratio

Bayesian Networks for Adaptive Decoding x1x1 x2x2 xTxT y1y1 y2y2 yTyT e1e1 e2e2 eTeT The information bits e t are coded by a convolutional error-correcting encoder.

EP Outperforms Viterbi Decoding Signal-Noise-Ratio

Outline Background on expectation propagation (EP) Extending EP on Bayesian networks for dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

Inference on Loopy Graphs Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(x i ), i = 1,..., 16. X1X1 X2X2 X3X3 X4X4 X5X5 X6X6 X7X7 X8X8 X9X9 X 10 X 11 X 12 X 13 X 14 X 15 X 16

4-node Loopy Graph Joint distribution is product of pairwise potentials for all edges: Want to approximate by a simpler distribution

BP vs. TreeEP BP TreeEP

Junction Tree Representation p(x) q(x) Junction tree

Two Kinds of Edges On-tree edges, e.g., (x 1,x 4 ): exactly incorporated into the junction tree Off-tree edges, e.g., (x 1,x 2 ): approximated by projecting them onto the tree structure

KL Minimization KL minimization moment matching Match single and pairwise marginals of Reduces to exact inference on single loops –Use cutset conditioning and

Matching Marginals on Graph (1) Incorporate edge ( x 3 x 4 ) x 3 x 4 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 (2) Incorporate edge ( x 6 x 7 ) x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 6 x 7 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6

Drawbacks of Global Propagation Update all the cliques even when only incorporating one off-tree edge –Computationally expensive Store each off-tree data message as a whole tree –Require large memory size

Solution: Local Propagation Allow q(x) be non-coherent during the iterations. It only needs to be coherent in the end. Exploit the junction tree representation: only locally propagate information within the minimal loop (subtree) that is directly connected to the off-tree edge. – Reduce computational complexity – Save memory

x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 3 x 4 x 1 x 2 x 1 x 3 x 1 x 4 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 5 x 7 x 1 x 2 x 1 x 3 x 1 x 4 x 3 x 5 x 3 x 6 x 3 x 5 x 6 x 7 x 5 x 7 x 3 x 5 x 3 x 6 (1) Incorporate edge( x 3 x 4 ) (3) Incorporate edge ( x 6 x 7 ) (2) Propagate evidence On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP

New Interpretation of TreeEP Marry EP with Junction algorithm Can perform efficiently over hypertrees and hypernodes

4-node Graph TreeEP = the proposed method GBP = generalized belief propagation on triangles TreeVB = variational tree BP = loopy belief propagation = Factorized EP MF = mean-field

Fully-connected graphs Results are averaged over 10 graphs with randomly generated potentials TreeEP performs the same or better than all other methods in both accuracy and efficiency!

8x8 grids, 10 trials MethodFLOPSError Exact30,0000 TreeEP300, BP/double-loop15,500, GBP17,500,

TreeEP versus BP and GBP TreeEP is always more accurate than BP and is often faster TreeEP is much more efficient than GBP and more accurate on some problems TreeEP converges more often than BP and GBP

Outline Background on expectation propagation (EP) Extending EP on Bayesian networks for dynamic systems –Poisson tracking –Signal detection for wireless communications Tree-structured EP on loopy graphs Conclusions and future work

Conclusions Extend EP on graphical models: –Instead of minimizing KL divergence, use other sensible criteria to generate messages. Effectively turn any deterministic filtering method into a smoothing method. –Use quadrature to approximate messages. –Local propagation to save the computation and memory in tree structured EP.

Conclusions Extended EP algorithms outperform state-of-art inference methods on graphical models in the tradeoff between accuracy and efficiency Computational Time Error Extended EP State-of-art Techniques

Future Work More extensions of EP: –How to choose a sensible approximation family (e.g. which tree structure) –More flexible approximation: mixture of EP? –Error bound? –Bayesian conditional random fields –EP for optimization (generalize max-product) More real-world applications, e.g., classification of gene expression data.

Classifying Colon Cancer Data by Predictive Automatic Relevance Determination The task: distinguish normal and cancer samples The dataset: 22 normal and 40 cancer samples with 2000 features per sample. The dataset was randomly split 100 times into 50 training and 12 testing samples. SVM results from Li et al. 2002

End Contact information: