Carnegie Mellon Focused Belief Propagation for Query-Specific Inference Anton Chechetka Carlos Guestrin 14 May 2010.

Slides:

Advertisements

Similar presentations

Mean-Field Theory and Its Applications In Computer Vision1 1.

Advertisements

Primal-dual Algorithm for Convex Markov Random Fields Vladimir Kolmogorov University College London GDR (Optimisation Discrète, Graph Cuts et Analyse d'Images)

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts.

Bayesian Belief Propagation

Scaling Up Graphical Model Inference

Exact Inference. Inference Basic task for inference: – Compute a posterior distribution for some query variables given some observed evidence – Sum out.

Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.

Exact Inference in Bayes Nets

Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Dynamic Bayesian Networks (DBNs)

Convergent Message-Passing Algorithms for Inference over General Graphs with Convex Free Energies Tamir Hazan, Amnon Shashua School of Computer Science.

Introduction to Belief Propagation and its Generalizations. Max Welling Donald Bren School of Information and Computer and Science University of California.

Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.

Belief Propagation on Markov Random Fields Aggeliki Tsoli.

CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep

GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.

A Graphical Model For Simultaneous Partitioning And Labeling Philip Cowans & Martin Szummer AISTATS, Jan 2005 Cambridge.

1 Image Completion using Global Optimization Presented by Tingfan Wu.

Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.

Carnegie Mellon Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin General approach: 1.Ground the model / features 2.Use.

Global Approximate Inference Eran Segal Weizmann Institute.

Carnegie Mellon Algorithms for Answering Queries with Graphical Models Thesis committee: Carlos Guestrin Eric Xing Drew Bagnell Pedro Domingos (UW) 21.

Belief Propagation, Junction Trees, and Factor Graphs

Belief Propagation Kai Ju Liu March 9, Statistical Problems Medicine Finance Internet Computer vision.

A Trainable Graph Combination Scheme for Belief Propagation Kai Ju Liu New York University.

Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,

Computer vision: models, learning and inference

Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.

Mean Field Inference in Dependency Networks: An Empirical Study Daniel Lowd and Arash Shamaei University of Oregon.

12/07/2008UAI 2008 Cumulative Distribution Networks and the Derivative-Sum-Product Algorithm Jim C. Huang and Brendan J. Frey Probabilistic and Statistical.

Planar Cycle Covering Graphs for inference in MRFS The Typhon Algorithm A New Variational Approach to Ground State Computation in Binary Planar Markov.

GraphLab: how I understood it with sample code Aapo Kyrola, Carnegie Mellon Univ. Oct 1, 2009.

1 Structured Region Graphs: Morphing EP into GBP Max Welling Tom Minka Yee Whye Teh.

1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago Umut Acar, MPI-SWS Alexander Ihler, UC Irvine Ramgopal Mettu,

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.

Tokyo Institute of Technology, Japan Yu Nishiyama and Sumio Watanabe Theoretical Analysis of Accuracy of Gaussian Belief Propagation.

Fast and accurate energy minimization for static or time-varying Markov Random Fields (MRFs) Nikos Komodakis (Ecole Centrale Paris) Nikos Paragios (Ecole.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Probabilistic Inference Lecture 5 M. Pawan Kumar Slides available online

Dynamic Tree Block Coordinate Ascent Daniel Tarlow 1, Dhruv Batra 2 Pushmeet Kohli 3, Vladimir Kolmogorov 4 1: University of Toronto3: Microsoft Research.

Daphne Koller Message Passing Belief Propagation Algorithm Probabilistic Graphical Models Inference.

Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From AAAI 2008 William Pentney, Department of Computer Science & Engineering University of.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:

Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.

Belief Propagation and its Generalizations Shane Oldenburger.

Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:

Single-Pass Belief Propagation

Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition Frank DiMaio and Jude Shavlik Computer Sciences Department University.

Pattern Recognition and Machine Learning

Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.

Today Graphical Models Representing conditional dependence graphically

Markov Networks: Theory and Applications Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208

1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:

Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):

Introduction of BP & TRW-S

Bucket Renormalization for Approximate Inference

CAP 5636 – Advanced Artificial Intelligence

Markov Networks.

CS 188: Artificial Intelligence

Arthur Choi and Adnan Darwiche UCLA

Expectation-Maximization & Belief Propagation

Readings: K&F: 5.1, 5.2, 5.3, 5.4, 5.5, 5.6, 5.7 Markov networks, Factor graphs, and an unified view Start approximate inference If we are lucky… Graphical.

Unifying Variational and GBP Learning Parameters of MNs EM for BNs

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Markov Networks.

BP in Practice Message Passing Inference Probabilistic Graphical

Mean Field and Variational Methods Loopy Belief Propagation

Presentation transcript:

Carnegie Mellon Focused Belief Propagation for Query-Specific Inference Anton Chechetka Carlos Guestrin 14 May 2010

Query-Specific inference problem 2 known query not interesting Using information about the query to speed up convergence of belief propagation for the query marginals

Graphical models Talk focus: pairwise Markov random fields Paper: arbitrary factor graphs (more general) Probability is a product of local potentials Graphical structure Compact representation  Intractable inference Approximate inference often works well in practice variable 3 potential r sj k i h u

(loopy) Belief Propagation Passing messages along edges Variable belief: Update rule: Result: all single-variable beliefs 4 r sj k i h u

(loopy) Belief Propagation Update rule: Message dependencies are local: Round–robin schedule Fix message order Apply updates in that order until convergence 5 r sj k i h u dependence dep.

Dynamic update prioritization Fixed update sequence is not the best option Dynamic update scheduling can speed up convergence Tree-Reweighted BP [Wainwright et. al., AISTATS 2003] Residual BP [Elidan et. al. UAI 2006] Residual BP  apply the largest change first informative update 2 2 wasted computation large change small change

Residual BP [Elidan et. al., UAI 2006] Update rule: Pick edge with largest residual Update oldnew More effort on the difficult parts of the model 7 But no query 

Our contributions Using weighted residuals to prioritize updates Define message weights reflecting the importance of the message to the query Computing importance weights efficiently Experiments: faster convergence on large relational models 8

Residual BP updates no influence on the query wasted computation Why edge importance weights? query residual < residual which to update?? want to update influence on the query in the future 9 Residual BP  max immediate residual reduction Our work  max approx. eventual effect on P(query)

Query-Specific BP Update rule: Pick edge with Update oldnew 10 Rest of the talk: defining and computing edge importance edge importance the only change!

Edge importance base case Pick edge with approximate eventual update effect on P(Q) query r sj k i h u ||P (NEW) (Q)  P (OLD) (Q)||||m (NEW)  m (OLD) ||  change in query beliefchange in message tight bound  1 Base case: edge directly connected to the query A j  i =?? 1 jijijiji ||  P(Q)|| ||  m || jiji

mjimji sup mrjmrj Edge one step away from the query: A r  j =?? mjimji sup mrjmrj Edge importance one step away query r sj k i h u ||  P(Q)||||  m ||  change in query belief change in message  can compute in closed form looking at only f ji [Mooij, Kappen; 2007] message importance jiji rjrj ||  m ||  over values of all other messages

One step away: A r  j = Edge importance general case query r sj k i h u ||  P(Q)||  ||  m s  h ||   P(Q) mshmsh sup     P(Q) mshmsh sup  Base case: A j  i =1 mhrmhr sup mshmsh mrjmrj mhrmhr mjimji mrjmrj mjimji mrjmrj   sensitivity(  ): max impact along the path  Generalization?  expensive to compute  bound may be infinite

Edge importance general case query r sj k i h u  11  P(Q) mshmsh sup  mhrmhr mshmsh mrjmrj mhrmhr mjimji mrjmrj   sensitivity(  ): max impact along the path  22 A s  h = max all paths  from to query sensitivity(  ) h There are a lot of paths in a graph, trying out every one is intractable 

Efficient edge importance computation A = max all paths  from to query sensitivity(  ) There are a lot of paths in a graph, trying out every one is intractable  always  1 always decreases as the path grows 15 mhrmhr sup mshmsh mrjmrj mhrmhr mjimji mrjmrj  sensitivity( h  r  j  i ) = always  1 decomposes into individual edge contributions Dijkstra’s (shortest paths) alg. will efficiently find max-sensitivity paths for every edge

A j  i = max all paths  from i to query sensitivity(  ) Query-Specific BP Run Dijkstra’s alg starting at query to get edge weights Pick edge with largest weighted residual Update 16 More effort on the difficult parts of the model Takes into account not only graphical structure, but also strength of dependencies and relevant

Outline Using weighted residuals to prioritize updates Define message weights reflecting the importance of the message to the query Computing importance weights efficiently As an initialization step before residual BP Restore anytime behavior of residual BP Experiments: faster convergence on large relational models 17

Big picture anytime property long initialization  Dijkstra BP This is broken! BeforeAfter 18 all marginals query marginals

Interleaving BP and Dijkstra’s Dijkstra’s expands the highest weight edges first Can pause it at any time and get the most relevant submodel Dijkstra BP Dijkstra BP … When to stop BP and resume Dijkstra’s? query full model 19

Interleaving Dijkstra’s expands the highest weight edges first query expanded on previous iteration just expanded not yet expanded min  expanded edges A  A suppose  M  min  expanded A no need to expand further at this point upper bound on priorityactual priority of 20

Any-Time query-specific BP query Dijkstra’s alg.BP updates Query-specific BP: Anytime QSBP: same BP update sequence! 21 r sj k i

Experiments – the setup Relational model: semi-supervised people recognition in collection of images [with Denver Dash and Matthai Intel] One variable per person Agreement potentials for people who look alike Disagreement potentials for people in the same image 2K variables, 900K factors Error measure: KL from the fixed point of BP 22 agree disagree

Experiments – convergence speed 23 better Residual BP Query-Specific BP Anytime Query-Specific BP (with Dijkstra’s interleaving)

Conclusions Prioritize updates by weighted residuals Takes query info into account Importance weights depend on both the graphical structure and strength of local dependencies Efficient computation of importance weights Much faster convergence on large relational models 24 Thank you!