OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon, Carla P. Gomes, Ashish Sabharwal +, and Bart Selman* *Cornell.

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman* *Cornell University + IBM Watson Research Center UAI - 2013 1

High-dimensional integration High-dimensional integrals in statistics, ML, physics Expectations / model averaging Marginalization Partition function / rank models / parameter learning Curse of dimensionality: Quadrature involves weighted sum over exponential number of items (e.g., units of volume) L L2L2 L3L3 LnLn n dimensional hypercube L4L4 2

Discrete Integration We are given A set of 2 n items Non-negative weights w Goal: compute total weight Compactly specified weight function: factored form (Bayes net, factor graph, CNF, …) Example 1: n=2 variables, sum over 4 items Example 2: n= 100 variables, sum over 2 100 ≈10 30 items (intractable) 1 4 0 5 … 2 n Items Goal: compute 5 + 0 + 2 + 1 = 8 5 0 2 1 Size visually represents weight 5 0 1 2 3 factor

Hardness 0/1 weights case: Is there at least a “1”?  SAT How many “1” ?  #SAT NP-complete vs. #P-complete. Much harder General weights: Find heaviest item (combinatorial optimization, MAP) Sum weights (discrete integration) [ICML-13] WISH: Approximate Discrete Integration via Optimization. E.g., partition function via MAP inference MAP inference often fast in practice: Relaxations / bounds Pruning 0 4 3 7 0 0 1 1 4 P NP P^#P PSPACE Easy Hard PH EXP

The algorithm requires only O(n log n) MAP queries to approximate the partition function within a constant factor WISH : Integration by Hashing and Optimization Aggregate MAP inference solutions Outer loop over n variables 5 MAP inference on model augmented with random parity constraints Repeat log(n) times Parity check nodes enforcing A σ= b (mod 2) Original graphical model AUGMENTED MODEL σ σ {0,1} n n binary variables

Visual working of the algorithm How it works …. median M 1 1 random parity constraint 2 random parity constraints …. 3 random parity constraints median M 2 median M 3 …. Mode M 0 +++ ×1×2 ×4 + … 6 Function to be integrated n times Log(n) times

Theorem [ICML-13]: With probability at least 1- δ (e.g., 99.9%) WISH computes a 16-approximation of the partition function (discrete integral) by solving θ(n log n) MAP inference queries (optimization). Theorem [ICML-13]: Can improve the approximation factor to (1+ε) by adding extra variables and factors. Example: factor 2 approximation with 4n variables Remark: faster than enumeration only when combinatorial optimization is efficient Accuracy Guarantees 7

Summary of contributions Introduction and previous work: WISH: Approximate Discrete Integration via Optimization. Partition function / marginalization via MAP inference Accuracy guarantees MAP Inference subject to parity constraints: Tractable cases and approximations Integer Linear Programming formulation New family of polynomial time (probabilistic) upper and lower bounds on partition function that can be iteratively tightened (will reach within constant factor) Sparsity of the parity constraints: Techniques to improve solution time and bounds quality Experimental improvements over variational techniques 8

MAP INFERENCE WITH PARITY CONSTRAINTS Hardness, approximations, and bounds 9

Making WISH more scalable Would approximations to the optimization (MAP inference with parity constraints) be useful? YES Bounds on MAP (optimization) translate to bounds on the partition function Z (discrete integral) Lower bounds (local search) on MAP  lower bounds on Z Upper bounds (LP,SDP relaxation) on MAP  upper bounds on Z Constant-factor approximations on MAP  constant factor on Z Question: Are there classes of problems where we can efficiently approximate the optimization (MAP inference) in the inner loop of WISH? 10

Error correcting codes Communication over a noisy channel Bob: There has been a transmission error! What was the message actually sent by Alice? Must be a valid codeword As close as possible to received message y 11 Noisy channel x y 0100|10110|1 Alice Bob Redundant parity check bit= 0 XOR 1 XOR 0 XOR 0 Parity check bit = 1 ≠ 0 XOR 1 XOR 1 XOR 0 = 0

Decoding a binary code Max-likelihood decoding 12 Parity check nodes ML-decoding graphical model Our more general case Parity check nodes More complex probabilistic model Max w(x) subject to A x = b (mod 2) Equivalent to MAP inference on augmented model MAP inference is NP hard to approximate within any constant factor [Stern, Arora,..] Transmitted string must be a codeword LDPC Routinely solved: 10GBase-T Ethernet, Wi-Fi 802.11n, digital TV,.. x Noisy channel model Noisy channel x y 0100|1 0110|1

Decoding via Integer Programming MAP inference subject to parity constraints encoded as an Integer Linear Program (ILP): Standard MAP encoding Compact (polynomial) encoding by Yannakakis for parity constraints LP relaxation: relax integrality constraint Polynomial time upper bounds ILP solving strategy: cuts + branching + LP relaxations Solve a sequence of LP relaxations Upper and lower bounds that improve over time 13 Parity polytope

Iterative bound tightening Polynomial time upper ad lower bounds on MAP that are iteratively tightened over time Recall: bounds on optimization (MAP)  (probabilistic) bounds on the partition function Z. New family of bounds. WISH: When MAP is solved to optimality (LowerBound = UpperBound), guaranteed constant factor approximation on Z 14

SPARSITY OF THE PARITY CONSTRAINTS Improving solution time and bounds quality 15

Inducing sparsity Observations: Problems with sparse A x = b (mod 2) are empirically easier to solve (similar to Low-Density Parity Check codes) Quality of LP relaxation depends on A and b, not just on the solution space. Elementary row operations (e.g., sum 2 equations) do not change solution space but affect the LP relaxation. 1) Reduce A x = b (mod 2) to row-echelon form with Gaussian elimination (linear equations over finite field) 2) Greedy application of elementary row operations 16 Parity check nodes Equivalent but sparser Parity check nodes Matrix A in row-echelon form

Improvements from sparsity Quality of LP relaxations significantly improves Finds integer solutions faster (better lower bounds) 17 Improvements from sparsification using IBM CPLEX ILP solver for a 10x10 Ising Grid Upper bound improvement Without sparsification, fails at finding integer solutions (LB)

WISH based on Universal Hashing: Randomly generate A in {0,1} i×n, b in {0,1} i Then A x + b (mod 2) is: Uniform over {0,1} i Pairwise independent Suppose we generate a sparse matrix A At most k variables per parity constraint (up to k ones per row of A) A x+b (mod 2) is still uniform, not pairwise independent anymore E.g. for k=1, A x = b mod 2 is equivalent to fixing i variables. Lots of correlation. (Knowing A x = b tells me a lot about A y = b) Generating sparse constraints n We optimize over solutions of A x = b mod 2 (parity constraints) A i x = b (mod 2) 18 Given variable assignments x and y, the events A x = b (mod 2) and A y =b (mod 2) are independent.

Using sparse parity constraints Theorem: With probability at least 1- δ (e.g., 99.9%) WISH with sparse parity constraints computes an approximate lower bound of the partition function. PRO: “Easier” MAP inference queries For example, random parity constraints of length 1 (= on a single variable). Equivalent to MAP with some variables fixed. CON: We lose the upper bound part. Output can underestimate the partition function. CON: No constant factor approximation anymore 19

MAP with sparse parity constraints MAP inference with sparse constraints evaluation ILP and Branch&Bound outperform message-passing (BP, MP and MPLP) 20 10x10 attractive Ising Grid10x10 mixed Ising Grid

Experimental results ILP provides probabilistic upper and lower bounds that improve over time and are often tighter than variational methods (BP, MF, TRW) 21

Experimental results (2) ILP provides probabilistic upper and lower bounds that improve over time and are often tighter than variational methods (BP, MF, TRW) 22

Conclusions [ICML-13] WISH: Discrete integration reduced to small number of optimization instances (MAP) Strong (probabilistic) accuracy guarantees MAP inference is still NP-hard Scalability: Approximations and Bounds Connection with max-likelihood decoding ILP formulation + sparsity (Gauss sparsification & uniform hashing) New family of probabilistic polynomial time computable upper and lower bounds on partition function. Can be iteratively tightened (will reach within a constant factor) Future work: Extension to continuous integrals and variables Sampling from high-dimensional probability distributions 23

Extra slides 24

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon, Carla P. Gomes, Ashish Sabharwal +, and Bart Selman* *Cornell.

Similar presentations

Presentation on theme: "OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon, Carla P. Gomes, Ashish Sabharwal +, and Bart Selman* *Cornell."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman* *Cornell.

Similar presentations

Presentation on theme: "OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman* *Cornell."— Presentation transcript:

Similar presentations

About project

Feedback

OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon, Carla P. Gomes, Ashish Sabharwal +, and Bart Selman* *Cornell.

Presentation on theme: "OPTIMIZATION WITH PARITY CONSTRAINTS: FROM BINARY CODES TO DISCRETE INTEGRATION Stefano Ermon, Carla P. Gomes, Ashish Sabharwal +, and Bart Selman* *Cornell."— Presentation transcript: