# Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts.

## Presentation on theme: "Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts."— Presentation transcript:

Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts et Analyse d'Images) Paris, 29 November 2005 Note: these slides contain animation

Energy function p q unary terms (data) pairwise terms (coherence) - x p are discrete variables (for example, x p {0,1}) - p ( ) are unary potentials - pq (, ) are pairwise potentials

Minimisation algorithms Min Cut / Max Flow [Ford&Fulkerson 56] [Grieg, Porteous, Seheult 89] : non-iterative (binary variables) [Boykov, Veksler, Zabih 99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables) + If applicable, gives very accurate results – Can be applied to a restricted class of functions BP – Max-product Belief Propagation [Pearl 86] + Can be applied to any energy function – In vision results are usually worse than that of graph cuts – Does not always converge TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky 02], [Kolmogorov 05] + Can be applied to any energy function + For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov 05]

Main idea: LP relaxation Goal: Minimize energy E(x) under constraints x p {0,1} In general, NP-hard problem! Relax discreteness constraints: allow x p [0,1] Results in linear program. Can be solved in polynomial time! Energy function with discrete variables LP relaxation E E E tight not tight

Solving LP relaxation Too large for general purpose LP solvers (e.g. interior point methods) Solve dual problem instead of primal: –Formulate lower bound on the energy –Maximize this bound –When done, solves primal problem (LP relaxation) Two different ways to formulate lower bound –Via posiforms: leads to maxflow algorithm –Via convex combination of trees: leads to tree-reweighted message passing Lower bound on the energy function E Energy function with discrete variables E E LP relaxation

Notation and Preliminaries

Energy function - visualisation 0 4 0 1 3 02 5 node p edge (p,q) node q label 0 label 1 0

0 4 0 1 3 02 5 node p edge (p,q) node q label 0 label 1 Energy function - visualisation 0 vector of all parameters

0 0 4 4 1 1 2 5 0 0 + 1 Reparameterisation

0 0 3 4 1 0 2 5 Definition. is a reparameterisation of if they define the same energy: 4 -1 1 -1 0 +1 Maxflow, BP and TRW perform reparameterisations 1

Part I: Lower bound via posiforms ( maxflow algorithm)

non-negative - lower bound on the energy: maximize Lower bound via posiforms [Hammer, Hansen, Simeone84]

Maximisation algorithm? –Consider functions of binary variables only Maximising lower bound for submodular functions –Definition of submodular functions –Overview of min cut/max flow –Reduction to max flow –Global minimum of the energy Maximising lower bound for non-submodular functions –Reduction to max flow More complicated graph –Part of optimal solution Outline of part I

Definition: E is submodular if every pairwise term satisfies Can be converted to canonical form: Submodular functions of binary variables 2 1234 1 0 00 5 zero cost

Overview of min cut/max flow

Min Cut problem source sink 2 1 1 2 3 4 5 Directed weighted graph

Min Cut problem sink 2 1 1 2 3 4 5 S = {source, node 1} T = {sink, node 2, node 3} Cut: source

Min Cut problem sink 2 1 1 2 3 4 5 S = {source, node 1} T = {sink, node 2, node 3} Cut: Task: Compute cut with minimum cost Cost(S,T) = 1 + 1 = 2 source

sink 2 1 1 2 3 4 5 source Maxflow algorithm value(flow)=0

Maxflow algorithm sink 2 1 1 2 3 4 5 value(flow)=0 source

Maxflow algorithm sink 1 1 0 3 3 4 4 value(flow)=1 source

Maxflow algorithm sink 1 1 0 3 3 4 4 value(flow)=1 source

Maxflow algorithm sink 1 0 0 3 4 3 3 value(flow)=2 source

Maxflow algorithm sink 1 0 0 3 4 3 3 value(flow)=2 source

value(flow)=2 sink 1 0 0 3 4 3 3 source Maxflow algorithm

Maximising lower bound for submodular functions: Reduction to maxflow

2 1234 1 0 00 5 sink 2 1 1 2 3 4 5 source value(flow)=0 0 Maxflow algorithm and reparameterisation

sink 2 1 1 2 3 4 5 value(flow)=0 2 1234 1 0 00 5 0 source Maxflow algorithm and reparameterisation

sink 1 1 0 3 3 4 4 value(flow)=1 1 0334 1 0 00 4 1 source Maxflow algorithm and reparameterisation

sink 1 1 0 3 3 4 4 value(flow)=1 1 0334 1 0 00 4 1 source Maxflow algorithm and reparameterisation

sink 1 0 0 3 4 3 3 value(flow)=2 1 0343 0 0 00 3 2 source Maxflow algorithm and reparameterisation

sink 1 0 0 3 4 3 3 value(flow)=2 1 0343 0 0 00 3 2 source Maxflow algorithm and reparameterisation

value(flow)=2 0 0 0 0 minimum of the energy: 2 0 sink 1 0 0 3 4 3 3 source Maxflow algorithm and reparameterisation

Maximising lower bound for non-submodular functions

Arbitrary functions of binary variables Can be solved via maxflow [Boros,Hammer,Sun91] –Specially constructed graph Gives solution to LP relaxation: for each node x p {0, 1/2, 1} E LP relaxation non-negative maximize

Arbitrary functions of binary variables 0 1 0 1 11/2 Part of optimal solution [Hammer, Hansen, Simeone84]

Part II: Lower bound via convex combination of trees ( tree-reweighted message passing)

Goal: compute minimum of the energy for In general, intractable! Obtaining lower bound: –Split into several components: –Compute minimum for each component: –Combine to get a bound on Use trees! Convex combination of trees [Wainwright, Jaakkola, Willsky 02]

graph tree T lower bound on the energy maximize Convex combination of trees (contd)

TRW algorithms Goal: find reparameterisation maximizing lower bound Apply sequence of different reparameterisation operations: –Node averaging –Ordinary BP on trees Order of operations? –Affects performance dramatically Algorithms: –[Wainwright et al. 02]: parallel schedule May not converge –[Kolmogorov05]: specific sequential schedule Lower bound does not decrease, convergence guarantees

Node averaging 0 1 4 0

2 0.5 2

Send messages –Equivalent to reparameterising node and edge parameters Two passes (forward and backward) Belief propagation (BP) on trees

3 0 Key property (Wainwright et al.): Upon termination p gives min-marginals for node p:

TRW algorithm of Wainwright et al. with tree-based updates (TRW-T) Run BP on all trees Average all nodes If converges, gives (local) maximum of lower bound Not guaranteed to converge. Lower bound may go down.

Sequential TRW algorithm (TRW-S) [Kolmogorov05] Run BP on all trees containing p Average node p Pick node p

Main property of TRW-S Theorem: lower bound never decreases. Proof sketch: 0 1 4 0

Main property of TRW-S 2 0.5 2 Theorem: lower bound never decreases. Proof sketch:

TRW-S algorithm Particular order of averaging and BP operations Lower bound guaranteed not to decrease There exists limit point that satisfies weak tree agreement condition Efficiency?

Average node p Pick node p inefficient? Efficient implementation Run BP on all trees containing p

Efficient implementation Key observation: Node averaging operation preserves messages oriented towards this node Reuse previously passed messages! Need a special choice of trees: –Pick an ordering of nodes –Trees: monotonic chains 456 789 123

Efficient implementation 456 789 123 Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration

Efficient implementation 456 789 123 Algorithm: –Forward pass: process nodes in the increasing order pass messages from lower neighbours –Backward pass: do the same in reverse order Linear running time of one iteration

Memory requirements Additional advantage of TRW-S: –Needs only half as much memory as standard message passing! –Similar observation for bipartite graphs and parallel schedule was made in [Felzenszwalb&Huttenlocher04] standard message passing TRW-S

Experimental results: binary segmentation (GrabCut) Time Energy average over 50 instances

Experimental results: stereo left image ground truth BP TRW-S

Experimental results: stereo

Summary MAP estimation algorithms are based on LP relaxation –Maximize lower bound Two ways to formulate lower bound Via posiforms: leads to maxflow algorithm –Polynomial time solution –But: applicable for restricted energies (e.g. binary variables) Submodular functions: global minimum Non-submodular functions: part of optimal solution Via convex combination of trees: leads to TRW algorithm –Convergence in the limit (for TRW-S) –Applicable to arbitrary energy function Graph cuts vs. TRW: –Accuracy: similar –Generality: TRW is more general –Speed: for stereo TRW is currently 2-5 times slower. But: 3 vs. 50 years of research! More suitable for parallel implementation (GPU? Hardware?)

Discrete vs. continuous functionals Continuous formulation (Geodesic active contours) Maxflow algorithm –Global minimum, polynomial-time Metrication artefacts? Level sets –Numerical stability? Geometrically motivated –Invariant under rotation Discrete formulation (Graph cuts)

Geo-cuts Continuous functional Construct graph such that for smooth contours C Class of continuous functionals? [Boykov&Kolmogorov03], [Kolmogorov&Boykov05]: –Geometric length/area (e.g. Riemannian) –Flux of a given vector field –Regional term

Download ppt "Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts."

Similar presentations