OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman* *Cornell.

Slides:



Advertisements
Similar presentations
1+eps-Approximate Sparse Recovery Eric Price MIT David Woodruff IBM Almaden.
Advertisements

Bayesian Belief Propagation
Primal Dual Combinatorial Algorithms Qihui Zhu May 11, 2009.
A threshold of ln(n) for approximating set cover By Uriel Feige Lecturer: Ariel Procaccia.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Variational Methods for Graphical Models Micheal I. Jordan Zoubin Ghahramani Tommi S. Jaakkola Lawrence K. Saul Presented by: Afsaneh Shirazi.
Applied Algorithmics - week7
CS420 lecture one Problems, algorithms, decidability, tractability.
Exploiting Sparse Markov and Covariance Structure in Multiresolution Models Presenter: Zhe Chen ECE / CMR Tennessee Technological University October 22,
Global Learning of Type Entailment Rules Jonathan Berant, Ido Dagan, Jacob Goldberger June 21 st, 2011.
Belief Propagation by Jakob Metzler. Outline Motivation Pearl’s BP Algorithm Turbo Codes Generalized Belief Propagation Free Energies.
1 Fast Primal-Dual Strategies for MRF Optimization (Fast PD) Robot Perception Lab Taha Hamedani Aug 2014.
CS774. Markov Random Field : Theory and Application Lecture 04 Kyomin Jung KAIST Sep
Approximate Counting via Correlation Decay Pinyan Lu Microsoft Research.
Visual Recognition Tutorial
Computational problems, algorithms, runtime, hardness
Asymptotic Enumerators of Protograph LDPCC Ensembles Jeremy Thorpe Joint work with Bob McEliece, Sarah Fogal.
Algorithms Analysis Section 3.3 of Rosen Fall 2008 CSCE 235 Introduction to Discrete Structures Course web-page: cse.unl.edu/~cse235 Questions:
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
1 Local optimality in Tanner codes June 2011 Guy Even Nissim Halabi.
Short XORs for Model Counting: From Theory to Practice Carla P. Gomes, Joerg Hoffmann, Ashish Sabharwal, Bart Selman Cornell University & Univ. of Innsbruck.
EECS 598: Background in Theory of Computation Igor L. Markov and John P. Hayes
1 Backdoors To Typical Case Complexity Ryan Williams Carnegie Mellon University Joint work with: Carla Gomes and Bart Selman Cornell University.
Code and Decoder Design of LDPC Codes for Gbps Systems Jeremy Thorpe Presented to: Microsoft Research
RAPTOR CODES AMIN SHOKROLLAHI DF Digital Fountain Technical Report.
Integer Programming Difference from linear programming –Variables x i must take on integral values, not real values Lots of interesting problems can be.
15-853Page :Algorithms in the Real World Error Correcting Codes I – Overview – Hamming Codes – Linear Codes.
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Mario Vodisek 1 HEINZ NIXDORF INSTITUTE University of Paderborn Algorithms and Complexity Erasure Codes for Reading and Writing Mario Vodisek ( joint work.
1 Contents college 3 en 4 Book: Appendix A.1, A.3, A.4, §3.4, §3.5, §4.1, §4.2, §4.4, §4.6 (not: §3.6 - §3.8, §4.2 - §4.3) Extra literature on resource.
1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla Gomes and Bart Selman Cornell University Muri Meeting March 2002.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
The Role of Specialization in LDPC Codes Jeremy Thorpe Pizza Meeting Talk 2/12/03.
CS 580S Sensor Networks and Systems Professor Kyoung Don Kang Lecture 7 February 13, 2006.
Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
24 November, 2011National Tsin Hua University, Taiwan1 Mathematical Structures of Belief Propagation Algorithms in Probabilistic Information Processing.
Decision Procedures An Algorithmic Point of View
Complexity Classes Kang Yu 1. NP NP : nondeterministic polynomial time NP-complete : 1.In NP (can be verified in polynomial time) 2.Every problem in NP.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Game Theory Meets Compressed Sensing
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Probabilistic Graphical Models
TAMING THE CURSE OF DIMENSIONALITY: DISCRETE INTEGRATION BY HASHING AND OPTIMIZATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman*
U NIFORM S OLUTION S AMPLING U SING A C ONSTRAINT S OLVER A S AN O RACLE Stefano Ermon Cornell University August 16, 2012 Joint work with Carla P. Gomes.
DIGITAL COMMUNICATIONS Linear Block Codes
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
NP-Complete problems.
Lecture 2: Statistical learning primer for biologists
Part 1: Overview of Low Density Parity Check(LDPC) codes.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Some Computation Problems in Coding Theory
1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla P. Gomes and Bart Selman Cornell University Muri Meeting June 2002.
CPS Computational problems, algorithms, runtime, hardness (a ridiculously brief introduction to theoretical computer science) Vincent Conitzer.
Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Perfect recall: Every decision node observes all earlier decision nodes and their parents (along a “temporal” order) Sum-max-sum rule (dynamical programming):
Extensive-Form Game Abstraction with Bounds
Data Driven Resource Allocation for Distributed Learning
New Characterizations in Turnstile Streams with Applications
Factor Graphs and the Sum-Product Algorithm
Background: Lattices and the Learning-with-Errors problem
Bucket Renormalization for Approximate Inference
Markov Networks.
Algorithms Analysis Section 3.3 of Rosen Spring 2017
Algorithms Analysis Section 3.3 of Rosen Spring 2013
Algorithms Analysis Section 3.3 of Rosen Spring 2018
Discrete Optimization
Presentation transcript:

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman* *Cornell University + IBM Watson Research Center UAI

High-dimensional integration High-dimensional integrals in statistics, ML, physics Expectations / model averaging Marginalization Partition function / rank models / parameter learning Curse of dimensionality: Quadrature involves weighted sum over exponential number of items (e.g., units of volume) L L2L2 L3L3 LnLn n dimensional hypercube L4L4 2

Discrete Integration We are given A set of 2 n items Non-negative weights w Goal: compute total weight Compactly specified weight function: factored form (Bayes net, factor graph, CNF, …) Example 1: n=2 variables, sum over 4 items Example 2: n= 100 variables, sum over ≈10 30 items (intractable) … 2 n Items Goal: compute = Size visually represents weight factor

Hardness 0/1 weights case: Is there at least a “1”?  SAT How many “1” ?  #SAT NP-complete vs. #P-complete. Much harder General weights: Find heaviest item (combinatorial optimization, MAP) Sum weights (discrete integration) [ICML-13] WISH: Approximate Discrete Integration via Optimization. E.g., partition function via MAP inference MAP inference often fast in practice: Relaxations / bounds Pruning P NP P^#P PSPACE Easy Hard PH EXP

The algorithm requires only O(n log n) MAP queries to approximate the partition function within a constant factor WISH : Integration by Hashing and Optimization Aggregate MAP inference solutions Outer loop over n variables 5 MAP inference on model augmented with random parity constraints Repeat log(n) times Parity check nodes enforcing A σ= b (mod 2) Original graphical model AUGMENTED MODEL σ σ {0,1} n n binary variables

Visual working of the algorithm How it works …. median M 1 1 random parity constraint 2 random parity constraints …. 3 random parity constraints median M 2 median M 3 …. Mode M ×1×2 ×4 + … 6 Function to be integrated n times Log(n) times

Theorem [ICML-13]: With probability at least 1- δ (e.g., 99.9%) WISH computes a 16-approximation of the partition function (discrete integral) by solving θ(n log n) MAP inference queries (optimization). Theorem [ICML-13]: Can improve the approximation factor to (1+ε) by adding extra variables and factors. Example: factor 2 approximation with 4n variables Remark: faster than enumeration only when combinatorial optimization is efficient Accuracy Guarantees 7

Summary of contributions Introduction and previous work: WISH: Approximate Discrete Integration via Optimization. Partition function / marginalization via MAP inference Accuracy guarantees MAP Inference subject to parity constraints: Tractable cases and approximations Integer Linear Programming formulation New family of polynomial time (probabilistic) upper and lower bounds on partition function that can be iteratively tightened (will reach within constant factor) Sparsity of the parity constraints: Techniques to improve solution time and bounds quality Experimental improvements over variational techniques 8

MAP INFERENCE WITH PARITY CONSTRAINTS Hardness, approximations, and bounds 9

Making WISH more scalable Would approximations to the optimization (MAP inference with parity constraints) be useful? YES Bounds on MAP (optimization) translate to bounds on the partition function Z (discrete integral) Lower bounds (local search) on MAP  lower bounds on Z Upper bounds (LP,SDP relaxation) on MAP  upper bounds on Z Constant-factor approximations on MAP  constant factor on Z Question: Are there classes of problems where we can efficiently approximate the optimization (MAP inference) in the inner loop of WISH? 10

Error correcting codes Communication over a noisy channel Bob: There has been a transmission error! What was the message actually sent by Alice? Must be a valid codeword As close as possible to received message y 11 Noisy channel x y 0100|10110|1 Alice Bob Redundant parity check bit= 0 XOR 1 XOR 0 XOR 0 Parity check bit = 1 ≠ 0 XOR 1 XOR 1 XOR 0 = 0

Decoding a binary code Max-likelihood decoding 12 Parity check nodes ML-decoding graphical model Our more general case Parity check nodes More complex probabilistic model Max w(x) subject to A x = b (mod 2) Equivalent to MAP inference on augmented model MAP inference is NP hard to approximate within any constant factor [Stern, Arora,..] Transmitted string must be a codeword LDPC Routinely solved: 10GBase-T Ethernet, Wi-Fi n, digital TV,.. x Noisy channel model Noisy channel x y 0100|1 0110|1

Decoding via Integer Programming MAP inference subject to parity constraints encoded as an Integer Linear Program (ILP): Standard MAP encoding Compact (polynomial) encoding by Yannakakis for parity constraints LP relaxation: relax integrality constraint Polynomial time upper bounds ILP solving strategy: cuts + branching + LP relaxations Solve a sequence of LP relaxations Upper and lower bounds that improve over time 13 Parity polytope

Iterative bound tightening Polynomial time upper ad lower bounds on MAP that are iteratively tightened over time Recall: bounds on optimization (MAP)  (probabilistic) bounds on the partition function Z. New family of bounds. WISH: When MAP is solved to optimality (LowerBound = UpperBound), guaranteed constant factor approximation on Z 14

SPARSITY OF THE PARITY CONSTRAINTS Improving solution time and bounds quality 15

Inducing sparsity Observations: Problems with sparse A x = b (mod 2) are empirically easier to solve (similar to Low-Density Parity Check codes) Quality of LP relaxation depends on A and b, not just on the solution space. Elementary row operations (e.g., sum 2 equations) do not change solution space but affect the LP relaxation. 1) Reduce A x = b (mod 2) to row-echelon form with Gaussian elimination (linear equations over finite field) 2) Greedy application of elementary row operations 16 Parity check nodes Equivalent but sparser Parity check nodes Matrix A in row-echelon form

Improvements from sparsity Quality of LP relaxations significantly improves Finds integer solutions faster (better lower bounds) 17 Improvements from sparsification using IBM CPLEX ILP solver for a 10x10 Ising Grid Upper bound improvement Without sparsification, fails at finding integer solutions (LB)

WISH based on Universal Hashing: Randomly generate A in {0,1} i×n, b in {0,1} i Then A x + b (mod 2) is: Uniform over {0,1} i Pairwise independent Suppose we generate a sparse matrix A At most k variables per parity constraint (up to k ones per row of A) A x+b (mod 2) is still uniform, not pairwise independent anymore E.g. for k=1, A x = b mod 2 is equivalent to fixing i variables. Lots of correlation. (Knowing A x = b tells me a lot about A y = b) Generating sparse constraints n We optimize over solutions of A x = b mod 2 (parity constraints) A i x = b (mod 2) 18 Given variable assignments x and y, the events A x = b (mod 2) and A y =b (mod 2) are independent.

Using sparse parity constraints Theorem: With probability at least 1- δ (e.g., 99.9%) WISH with sparse parity constraints computes an approximate lower bound of the partition function. PRO: “Easier” MAP inference queries For example, random parity constraints of length 1 (= on a single variable). Equivalent to MAP with some variables fixed. CON: We lose the upper bound part. Output can underestimate the partition function. CON: No constant factor approximation anymore 19

MAP with sparse parity constraints MAP inference with sparse constraints evaluation ILP and Branch&Bound outperform message-passing (BP, MP and MPLP) 20 10x10 attractive Ising Grid10x10 mixed Ising Grid

Experimental results ILP provides probabilistic upper and lower bounds that improve over time and are often tighter than variational methods (BP, MF, TRW) 21

Experimental results (2) ILP provides probabilistic upper and lower bounds that improve over time and are often tighter than variational methods (BP, MF, TRW) 22

Conclusions [ICML-13] WISH: Discrete integration reduced to small number of optimization instances (MAP) Strong (probabilistic) accuracy guarantees MAP inference is still NP-hard Scalability: Approximations and Bounds Connection with max-likelihood decoding ILP formulation + sparsity (Gauss sparsification & uniform hashing) New family of probabilistic polynomial time computable upper and lower bounds on partition function. Can be iteratively tightened (will reach within a constant factor) Future work: Extension to continuous integrals and variables Sampling from high-dimensional probability distributions 23

Extra slides 24