Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago Umut Acar, MPI-SWS Alexander Ihler, UC Irvine Ramgopal Mettu,

Similar presentations


Presentation on theme: "Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago Umut Acar, MPI-SWS Alexander Ihler, UC Irvine Ramgopal Mettu,"— Presentation transcript:

1 Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago Umut Acar, MPI-SWS Alexander Ihler, UC Irvine Ramgopal Mettu, UMass Amherst

2 Graphical models Structured (neg) energy function Goal: Examples C B AC B A C B A Bayesian Network Factor Graph Markov Random Field Pairwise:

3 Graphical models Structured (neg) energy function Goal: Examples – Stereo depth C B AC B A C B A Bayesian Network Factor Graph Markov Random Field Pairwise: Stereo image pairMRF modelDepth

4 Graphical models Structured (neg) energy function Goal: Examples – Stereo depth – Protein design & prediction C B AC B A C B A Bayesian Network Factor Graph Markov Random Field Pairwise:

5 Graphical models Structured (neg) energy function Goal: Examples – Stereo depth – Protein design & prediction – Weighted constraint satisfaction problems C B AC B A C B A Bayesian Network Factor Graph Markov Random Field Pairwise:

6 Dual decomposition methods Original

7 Dual decomposition methods Decompose graph into smaller subproblems Solve each independently; optimistic bound Exact if all copies agree OriginalDecomposition

8 Dual decomposition methods Decompose graph into smaller subproblems Solve each independently; optimistic bound Exact if all copies agree Enforce lost equality constraints via Langrange multipliers OriginalDecomposition

9 Dual decomposition methods Same bound by different names Dual decomposition (Komodakis et al. 2007) TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007) Soft arc consistency (Cooper & Schiex 2004) OriginalDecomposition

10 Dual decomposition methods OriginalDecomposition MAP Energy Consistent solutions Relaxed problems

11 Optimizing the bound Subgradient descent Find each subproblem’s optimal configuration Adjust entries for mis-matched solutions 522 120 002 101 010 121 00 00 00

12 Optimizing the bound Subgradient descent Find each subproblem’s optimal configuration Adjust entries for mis-matched solutions 522 120 002 101 010 121 00 00 00

13 Optimizing the bound Subgradient descent Find each subproblem’s optimal configuration Adjust entries for mis-matched solutions 411 120 113 212 010 010 +1 00 +1

14 Equivalent decompositions Any collection of tree-structured parts are equivalent Two extreme cases – Set of all individual edges – Single “covering tree” of all edges; variables duplicated Original graph “Edges”Covering tree

15 Speeding up inference Parallel updates – Easy to perform subproblems in parallel (e.g. Komodakis et al. 2007) Adaptive updates

16 Some complications… Example: Markov chain – Can pass messages in parallel, but… – If x n depends on x 1, takes O(n) time anyway – Slow “convergence rate” Larger problems are more “efficient” Smaller problems are easily parallel & adaptive Similar effects in message passing – Residual splash (Gonzales et al. 2009) x 1 --- x 2 --- x 3 --- x 4 --- x 5 --- x 6 --- x 7 --- x 8 ---x 9 ---x 10

17 Cluster trees x 1 -- x 2 --- x 3 --- x 4 --- x 5 -- x 6 --- x 7 ---- x 8 ---x 9 --x 10 Alternative means of parallel computation – Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) Simple chain model – Normally, eliminate variables “in order” (DP) – Each calculation depends on all previous results

18 Cluster trees x 1 -- x 2 --- x 3 --- x 4 --- x 5 -- x 6 --- x 7 ---- x 8 ---x 9 --x 10 Alternative means of parallel computation – Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) Simple chain model – Normally, eliminate variables “in order” (DP) – Each calculation depends on all previous results

19 Cluster trees x 1 -- x 2 --- x 3 --- x 4 --- x 5 -- x 6 --- x 7 ---- x 8 ---x 9 --x 10 Alternative means of parallel computation – Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) Simple chain model – Normally, eliminate variables “in order” (DP) – Each calculation depends on all previous results

20 Cluster trees Alternative means of parallel computation – Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006) Simple chain model – Normally, eliminate variables “in order” (DP) – Each calculation depends on all previous results x 1 -- x 2 --- x 3 --- x 4 --- x 5 -- x 6 --- x 7 ---- x 8 ---x 9 --x 10

21 Cluster trees Alternative means of parallel computation Eliminate variables in alternative order – Eliminate some intermediate (degree 2) nodes x 1 --- x 2 --- x 3 --- x 4 --- x 5 --- x 6 --- x 7 --- x 8 ---x 9 ---x 10

22 Cluster trees Alternative means of parallel computation Eliminate variables in alternative order – Eliminate some intermediate (degree 2) nodes – Balanced: depth log(n) x 1 --- x 2 --- x 3 -- x 4 --- x 5 --- x 6 -- x 7 -- x 8 ---x 9 ---x 10 x 10 x 5 x 2 x 6 x 3 x 8 x 1 x 4 x 7 x 9

23 Adapting to changes x 1 --- x 2 --- x 3 --- x 4 --- x 5 --- x 6 --- x 7 --- x 8 ---x 9 ---x 10

24 Adapting to changes x 1 --- x 2 --- x 3 --- x 4 --- x 5 --- x 6 --- x 7 --- x 8 ---x 9 ---x 10

25 Adapting to changes x 1 --- x 2 --- x 3 --- x 4 --- x 5 --- x 6 --- x 7 --- x 8 ---x 9 ---x 10

26 Adapting to changes 1 st pass: update O(log n) cluster functions 2 nd pass: mark changed configurations, repeat decoding: O(m log n/m) n = sequence length; m = # of changes x 1 --- x 2 --- x 3 --- x 4 --- x 5 --- x 6 --- x 7 --- x 8 ---x 9 ---x 10

27 Adapting to changes 1 st pass: update O(log n) cluster functions 2 nd pass: mark changed configurations, repeat decoding: O(m log n/m) n = sequence length; m = # of changes x 1 --- x 2 --- x 3 --- x 4 --- x 5 --- x 6 --- x 7 --- x 8 ---x 9 ---x 10

28 Experiments Random synthetic problems – Random, irregular but “grid-like” connectivity Stereo depth images – Superpixel representation – Irregular graphs Compare “edges” and “cover-tree” 32-core Intel Xeon, Cilk++ implementation

29 Synthetic problems Larger problems improve convergence rate

30 Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead

31 Synthetic problems Larger problems improve convergence rate Adaptivity helps significantly Cluster overhead Parallelism

32 Synthetic models As a function of problem size

33 Stereo depth

34

35

36 Time to convergence for different problems

37 Conclusions Fast methods for dual decomposition – Parallel computation – Adaptive updating Subproblem choice – Small problems: highly parallel, easily adaptive – Large problems: better convergence rates Cluster trees – Alternative form for parallel & adaptive updates – Benefits of both large & small subproblems


Download ppt "Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers Ozgur Sumer, U. Chicago Umut Acar, MPI-SWS Alexander Ihler, UC Irvine Ramgopal Mettu,"

Similar presentations


Ads by Google