Message Passing Algorithms for Optimization

Slides:

Advertisements

Similar presentations

1 LP, extended maxflow, TRW OR: How to understand Vladimirs most recent work Ramin Zabih Cornell University.

Advertisements

Linear Time Methods for Propagating Beliefs Min Convolution, Distance Transforms and Box Sums Daniel Huttenlocher Computer Science Department December,

A. S. Morse Yale University University of Minnesota June 4, 2014 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A.

C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

C&O 355 Mathematical Programming Fall 2010 Lecture 22 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A.

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.

ICCV 2007 tutorial Part III Message-passing algorithms for energy minimization Vladimir Kolmogorov University College London.

Optimization of thermal processes

Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.

1 EL736 Communications Networks II: Design and Algorithms Class8: Networks with Shortest-Path Routing Yong Liu 10/31/2007.

Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.

Thursday, April 25 Nonlinear Programming Theory Separable programming Handouts: Lecture Notes.

Iterative methods TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA A A A A A A A.

Chapter 4: Network Layer

F IXING M AX -P RODUCT : A U NIFIED L OOK AT M ESSAGE P ASSING A LGORITHMS Nicholas Ruozzi and Sekhar Tatikonda Yale University.

Distributed Message Passing for Large Scale Graphical Models Alexander Schwing Tamir Hazan Marc Pollefeys Raquel Urtasun CVPR2011.

Convergent and Correct Message Passing Algorithms Nicholas Ruozzi and Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual.

Global Approximate Inference Eran Segal Weizmann Institute.

Approximation Algorithms

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley Asynchronous Distributed Algorithm Proof.

Data Flow Analysis Compiler Design Nov. 8, 2005.

Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.

Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.

Optimization Methods One-Dimensional Unconstrained Optimization

Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.

1 Lecture 4 Maximal Flow Problems Set Covering Problems.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

C&O 355 Mathematical Programming Fall 2010 Lecture 17 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A.

CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep

Probabilistic Graphical Models

Chapter 1. Formulations 1. Integer Programming  Mixed Integer Optimization Problem (or (Linear) Mixed Integer Program, MIP) min c’x + d’y Ax +

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Probabilistic Inference Lecture 5 M. Pawan Kumar Slides available online

Another story on Multi-commodity Flows and its “dual” Network Monitoring Rohit Khandekar IBM Watson Joint work with Baruch Awerbuch JHU TexPoint fonts.

Update any set S of nodes simultaneously with step-size We show fixed point update is monotone for · 1/|S| Covering Trees and Lower-bounds on Quadratic.

EE 685 presentation Optimization Flow Control, I: Basic Algorithm and Convergence By Steven Low and David Lapsley.

Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)

Linear Program Set Cover. Given a universe U of n elements, a collection of subsets of U, S = {S 1,…, S k }, and a cost function c: S → Q +. Find a minimum.

Exact Differentiable Exterior Penalty for Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison December 20, 2015 TexPoint.

CPSC 536N Sparse Approximations Winter 2013 Lecture 1 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAAAAAA.

Iterative Byzantine Vector Consensus in Incomplete Graphs Nitin Vaidya University of Illinois at Urbana-Champaign ICDCN presentation by Srikanth Sastry.

1 CS612 Algorithms for Electronic Design Automation CS 612 – Lecture 8 Lecture 8 Network Flow Based Modeling Mustafa Ozdal Computer Engineering Department,

Lagrangean Relaxation

Vasilis Syrgkanis Cornell University

1 Slides by Yong Liu 1, Deep Medhi 2, and Michał Pióro 3 1 Polytechnic University, New York, USA 2 University of Missouri-Kansas City, USA 3 Warsaw University.

Join-graph based cost-shifting Alexander Ihler, Natalia Flerova, Rina Dechter and Lars Otten University of California Irvine Introduction Mini-Bucket Elimination.

International Iran conference on Quantum Information September 2007, Kish Island Evaluation of bounds of codes defined over hexagonal and honeycomb lattices.

STATIC ANALYSIS OF UNCERTAIN STRUCTURES USING INTERVAL EIGENVALUE DECOMPOSITION Mehdi Modares Tufts University Robert L. Mullen Case Western Reserve University.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Ch. 3 Iterative Method for Nonlinear problems EE692 Parallel and Distribution.

1 Approximation algorithms Algorithms and Networks 2015/2016 Hans L. Bodlaender Johan M. M. van Rooij TexPoint fonts used in EMF. Read the TexPoint manual.

MAP Estimation in Binary MRFs using Bipartite Multi-Cuts Sashank J. Reddi Sunita Sarawagi Sundar Vishwanathan Indian Institute of Technology, Bombay TexPoint.

C&O 355 Lecture 19 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.

Distributed cooperation and coordination using the Max-Sum algorithm

Approximation Algorithms Duality My T. UF.

Network Systems Lab. Korea Advanced Institute of Science and Technology No.1 Maximum Norms & Nonnegative Matrices  Weighted maximum norm e.g.) x1x1 x2x2.

Approximation Algorithms based on linear programming.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Computation of the solutions of nonlinear polynomial systems

Chapter 5. Optimal Matchings

TexPoint fonts used in EMF.

Expectation-Maximization & Belief Propagation

Maths for Signals and Systems Linear Algebra in Engineering Lectures 13 – 14, Tuesday 8th November 2016 DR TANIA STATHAKI READER (ASSOCIATE PROFFESOR)

Flow Feasibility Problems

Readings: K&F: 11.3, 11.5 Yedidia et al. paper from the class website

Totally Asynchronous Iterative Algorithms

Chapter 1. Formulations.

Presentation transcript:

Message Passing Algorithms for Optimization Nicholas Ruozzi Advisor: Sekhar Tatikonda Yale University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

The Problem Minimize a real-valued objective function that factorizes as a sum of potentials (a multiset whose elements are subsets of the indices 1,…,n) \[f:\prod_i \mathcal{X}_i \rightarrow \mathbb{R}\cup \{\infty\}\]

Corresponding Graph 1 2 3 \[\psi_{12}(x_1, x_2)\]

Local Message Passing Algorithms 1 2 3 Pass messages on this graph to minimize f Distributed message passing algorithm Ideal for large scientific problems, sensor networks, etc.

The Min-Sum Algorithm Messages at time t: 1 2 3 4

Computing Beliefs The min-marginal corresponding to the ith variable is given by Beliefs approximate the min-marginals: Estimate the optimal assignment as

Min-Sum: Convergence Properties Iterations do not necessarily converge Always converges when the factor graph is a tree Converged estimates need not correspond to the optimal solution Performs well empirically

Previous Work Prior work focused on two aspects of message passing algorithms Convergence Coordinate ascent schemes Not necessarily local message passing algorithms Correctness No combinatorial characterization of failure modes Concerned only with global optimality

Contributions A new local message passing algorithm Parameterized family of message passing algorithms Conditions under which the estimate produced by the splitting algorithm is guaranteed to be a global optima Conditions under which the estimate produced by the splitting algorithm is guaranteed to be a local optima

Contributions What makes a graphical model “good”? Combinatorial understanding of the failure modes of the splitting algorithm via graph covers Can be extended to other iterative algorithms Techniques for handling objective functions for which the known convergent algorithms fail Reparameterization centric approach

Publications Convergent and correct message passing schemes for optimization problems over graphical models Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), July 2010 Fixing Max-Product: A Unified Look at Message Passing Algorithms (invited talk) Proceedings of the Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 2010 Unconstrained minimization of quadratic functions via min-sum Proceedings of the Conference on Information Sciences and Systems (CISS), Princeton, NJ/USA, March 2010 Graph covers and quadratic minimization Proceedings of the Forty-Seventh Annual Allerton Conference on Communication, Control, and Computing, September 2009 s-t paths using the min-sum algorithm Proceedings of the Forty-Sixth Annual Allerton Conference on Communication, Control, and Computing, September 2008

Outline Reparameterizations Finding a Minimizing Assignment Lower Bounds Convergent Message Passing Finding a Minimizing Assignment Graph covers Quadratic Minimization

The Problem Minimize a real-valued objective function that factorizes as a sum of potentials (a multiset whose elements are subsets of the indices 1,…,n)

Factorizations Some factorizations are better than others If xi takes one of k values this requires at most 2k2 + k operations \begin{eqnarray*} \min_x f(x) & = & \min_x [\phi_2(x_2) + \psi_{12}(x_1, x_2) + \psi_{23}(x_2, x_3)]\\ & = & \min_{x_2} [\phi_2(x_2) + \min_{x_1}\psi_{12}(x_1, x_2) + \min_{x_3}\psi_{23}(x_2, x_3)]\\ \end{eqnarray*} \[f(x) = \phi_2(x_2) + \psi_{12}(x_1, x_2) + \psi_{23}(x_2, x_3)\]

Factorizations Some factorizations are better than others Suppose Only need k operations to compute the minimum value! \begin{eqnarray*} \min_x f(x) & = & \min_x [\phi_2(x_2) + \psi_{12}(x_1, x_2) + \psi_{23}(x_2, x_3)]\\ & = & \min_{x_2} [\phi_2(x_2) + \min_{x_1}\psi_{12}(x_1, x_2) + \min_{x_3}\psi_{23}(x_2, x_3)]\\ \end{eqnarray*} \[f(x) = \phi_2(x_2) + \psi_{12}(x_1, x_2) + \psi_{23}(x_2, x_3)\]

Reparameterizations We can rewrite the objective function as This does not change the objective function as long as the messages are real-valued at each x The objective function is reparameterized in terms of the messages

Reparameterizations We can rewrite the objective function as The reparameterization has the same factor graph as the original factorization Many message passing algorithms produce a reparameterization upon convergence

The Splitting Reparameterization Let c be a vector of non-zero reals If c is a vector of positive integers, then we could view this as a factorization in two ways: Over the same factor graph as the original potentials Over a factor graph where each potential has been “split” into several pieces \begin{eqnarray*} f(x) & = & \sum_i [\phi_i(x_i) + \sum_{\alpha\in\partial i} c_ic_\alpha m_{\alpha i}(x_i)] + \sum_\alpha[\psi_\alpha(x_\alpha) - \sum_{i\in\alpha} c_ic_\alpha m_{\alpha i}(x_i)]\\ & = & \sum_i c_i[\frac{\phi_i(x_i)}{c_i} + \sum_{\alpha\in\partial i} c_\alpha m_{\alpha i}(x_i)] + \sum_\alpha c_\alpha[\frac{\psi_\alpha(x_\alpha)}{c_\alpha} - \sum_{i\in\alpha} c_im_{\alpha i}(x_i)]\\ \end{eqnarray*}

The Splitting Reparameterization 2 1 3 2 1 3 \begin{eqnarray*} f(x) & = & \sum_i [\phi_i(x_i) + \sum_{\alpha\in\partial i} c_ic_\alpha m_{\alpha i}(x_i)] + \sum_\alpha[\psi_\alpha(x_\alpha) - \sum_{i\in\alpha} c_ic_\alpha m_{\alpha i}(x_i)]\\ & = & \sum_i c_i[\frac{\phi_i(x_i)}{c_i} + \sum_{\alpha\in\partial i} c_\alpha m_{\alpha i}(x_i)] + \sum_\alpha c_\alpha[\frac{\psi_\alpha(x_\alpha)}{c_\alpha} - \sum_{i\in\alpha} c_im_{\alpha i}(x_i)]\\ \end{eqnarray*} Factor graph resulting from “splitting” each of the pairwise potentials 3 times Factor graph

The Splitting Reparameterization Beliefs: Reparameterization: \begin{eqnarray*} f(x) & = & \sum_i [\phi_i(x_i) + \sum_{\alpha\in\partial i} c_ic_\alpha m_{\alpha i}(x_i)] + \sum_\alpha[\psi_\alpha(x_\alpha) - \sum_{i\in\alpha} c_ic_\alpha m_{\alpha i}(x_i)]\\ & = & \sum_i c_i[\frac{\phi_i(x_i)}{c_i} + \sum_{\alpha\in\partial i} c_\alpha m_{\alpha i}(x_i)] + \sum_\alpha c_\alpha[\frac{\psi_\alpha(x_\alpha)}{c_\alpha} - \sum_{i\in\alpha} c_im_{\alpha i}(x_i)]\\ \end{eqnarray*} f(x) = \sum_i c_ib_i(x_i) + \sum_\alpha c_\alpha[b_\alpha(x_\alpha) - \sum_{k\in\alpha} c_kb_k(x_k)]

Outline Reparameterizations Finding a Minimizing Assignment Lower Bounds Convergent Message Passing Finding a Minimizing Assignment Graph covers Quadratic Minimization

Lower Bounds Can lower bound the objective function with these reparameterizations: Find the collection of messages that maximize this lower bound Lower bound is a concave function of the messages Use coordinate ascent or subgradient methods \begin{eqnarray*} \min_x f(x) & = & \min_x \Big[\sum_i c_ib_i(x_i) + \sum_\alpha c_\alpha[b_\alpha(x_\alpha) - \sum_{k\in\alpha} c_kb_k(x_k)]\Big]\\ & \geq & \sum_i \min_x \Big[c_ib_i(x_i)\Big] + \sum_\alpha \min_x \Big[c_\alpha[b_\alpha(x_\alpha) - \sum_{k\in\alpha} c_kb_k(x_k)]\Big] \end{eqnarray*}

Lower Bounds and the MAP LP Equivalent to minimizing f Dual provides a lower bound on f Messages are a side-effect of certain dual formulations

Outline Reparameterizations Finding a Minimizing Assignment Lower Bounds Convergent Message Passing Finding a Minimizing Assignment Graph covers Quadratic Minimization

The Splitting Algorithm A local message passing algorithm for the splitting reparameterization Contains the min-sum algorithm as a special case For the integer case, can be derived from the min-sum update equations

The Splitting Algorithm For certain choices of c, an asynchronous version of the splitting algorithm can be shown to be a block coordinate ascent scheme for the lower bound: For example: \begin{eqnarray*} \min_x f(x) & \geq & \sum_i \min_x \Big[c_i(1-\sum_{\alpha\in\partial i}c_\alpha)b_i(x_i)\Big] + \sum_\alpha \min_x \Big[c_\alpha b_\alpha(x_\alpha)\Big] \end{eqnarray*}

Asynchronous Splitting Algorithm 1 3 2

Asynchronous Splitting Algorithm 1 3 2

Asynchronous Splitting Algorithm 1 3 2

Coordinate Ascent Guaranteed to converge Does not necessarily maximize the lower bound Can get stuck in a suboptimal configuration Can be shown to converge to the maximum in restricted cases Pairwise-binary objective functions

Other Ascent Schemes Many other ascent algorithms are possible over different lower bounds: TRW-S [Kolmogorov 2007] MPLP [Globerson and Jaakkola 2007] Max-Sum Diffusion [Werner 2007] Norm-product [Hazan 2010] Not all coordinate ascent schemes are local

Outline Reparameterizations Finding a Minimizing Assignment Lower Bounds Convergent Message Passing Finding a Minimizing Assignment Graph covers Quadratic Minimization

Constructing the Solution Construct an estimate, x*, of the optimal assignment from the beliefs by choosing For certain choices of the vector c, if each argmin is unique, then x* minimizes f A simple choice of c guarantees both convergence and correctness (if the argmins are unique)

Correctness If the argmins are not unique, then we may not be able to construct a solution When does the algorithm converge to the correct minimizing assignment?

Outline Reparameterizations Finding a Minimizing Assignment Lower Bounds Convergent Message Passing Finding a Minimizing Assignment Graph covers Quadratic Minimization

Graph Covers A graph H covers a graph G if there is homomorphism from H to G that is a bijection on neighborhoods 2 1 3 3’ 2’ 1’ 2 1 3 Graph G 2-cover of G

Graph Covers Potential functions are “lifts” of the nodes they cover 2 1 3 3’ 2’ 1’ 2 1 3 Graph G 2-cover of G

Graph Covers The lifted potentials define a new objective function 2-cover objective function \[f(x) = \psi_{12}(x_{12}) + \psi_{23}(x_{23}) + \psi_{13}(x_{13}) \begin{eqnarray*} f_2(x,x') & = & \psi_{12}(x_1,x_2) + \psi_{23}(x_2,x_3) + \psi_{13}(x'_1, x_3)\\ &&+\: \psi_{12}(x'_1,x'_2) + \psi_{23}(x'_2,x'_3) + \psi_{13}(x_1, x'_3) \end{eqnarray*} \]

Graph Covers Indistinguishability: for any cover and any choice of initial messages on the original graph, there exists a choice of initial messages on the cover such that the messages passed by the splitting algorithm are identical on both graphs For choices of c that guarantee correctness, any assignment that uniquely minimizes each must also minimize the objective function corresponding to any finite cover

Maximum Weight Independent Set 2 1 3 3’ 2’ 1’ 1 2 3 LIFTS!!!! Graph G 2-cover of G

Maximum Weight Independent Set 2 5 5 2 LIFTS!!!! Graph G 2-cover of G

Maximum Weight Independent Set 2 5 5 2 LIFTS!!!! Graph G 2-cover of G

Maximum Weight Independent Set 2 3 3 2 LIFTS!!!! Graph G 2-cover of G

Maximum Weight Independent Set 2 3 3 2 LIFTS!!!! Graph G 2-cover of G

More Graph Covers If covers of the factor graph have different solutions The splitting algorithm cannot converge to the correct answer for choices of c that guarantee correctness The min-sum algorithm may converge to an assignment that is optimal on a cover There are applications for which the splitting algorithm always works Minimum cuts, shortest paths, and more…

Graph Covers Suppose f factorizes over a set with corresponding factor graph G and the choice of c guarantees correctness Theorem: the splitting algorithm can only converge to beliefs that have unique argmins if f is uniquely minimized at the assignment x* The objective function corresponding to every finite cover H of G has a unique minimum that is a lift of x*

Graph Covers This result suggests that There is a close link between “good” factorizations and the difficulty of a problem Convergent and correct algorithms are not ideal for all applications Convex functions can be covered by functions that are not convex

Outline Reparameterizations Finding a Minimizing Assignment Lower Bounds Convergent Message Passing Finding a Minimizing Assignment Graph covers Quadratic Minimization

Quadratic Minimization symmetric positive definite implies a unique minimum Minimized at

Quadratic Minimization For a positive definite matrix, min-sum convergence implies a correct solution: Min-sum is not guaranteed to converge for all symmetric positive definite matrices \[f(x_1,...,x_n) = \Big[\sum_i \frac{\Gamma_{ii}}{2}x_i^2 -h_ix_i\Big] + \sum_{i>j} \Gamma_{ij}x_ix_j\]

Quadratic Minimization A symmetric matrix is scaled diagonally dominant if there exists w > 0 such that for each row i: Theorem: ¡ is scaled diagonally iff every finite cover of ¡ is positive definite

Quadratic Minimization Scaled diagonal dominance is a sufficient condition for the convergence of other iterative methods Gauss-Seidel, Jacobi, and min-sum Suggests a generalization of scaled diagonal dominance for arbitrary convex functions Purely combinatorial! Empirically, the splitting algorithm can always be made to converge for this problem

Conclusion General strategy for minimization Correctness is too strong Reparameterization Lower bounds Convergent and correct message passing algorithms Correctness is too strong Algorithms cannot distinguish graph covers Can fail to hold even for convex problems

Conclusion Open questions Deep relationship between “hardness” of a problem and its factorizations Convergence and correctness criteria for the min-sum algorithm Rates of convergence

Questions? A draft of the thesis is available online at: http://cs-www.cs.yale.edu/homes/nruozzi/Papers/ths2.pdf