Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet Khli Microsoft Research Cambridge Presented.

Slides:



Advertisements
Similar presentations
MAP Estimation Algorithms in
Advertisements

POSE–CUT Simultaneous Segmentation and 3D Pose Estimation of Humans using Dynamic Graph Cuts Mathieu Bray Pushmeet Kohli Philip H.S. Torr Department of.
Mean-Field Theory and Its Applications In Computer Vision1 1.
Primal-dual Algorithm for Convex Markov Random Fields Vladimir Kolmogorov University College London GDR (Optimisation Discrète, Graph Cuts et Analyse d'Images)
Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London Tutorial at GDR (Optimisation Discrète, Graph Cuts.
Solving Markov Random Fields using Second Order Cone Programming Relaxations M. Pawan Kumar Philip Torr Andrew Zisserman.
Beyond Convexity – Submodularity in Machine Learning
Submodularity for Distributed Sensing Problems Zeyn Saigol IR Lab, School of Computer Science University of Birmingham 6 th July 2010.
Introduction to Markov Random Fields and Graph Cuts Simon Prince
Parameter Learning in MN. Outline CRF Learning CRF for 2-d image segmentation IPF parameter sharing revisited.
I Images as graphs Fully-connected graph – node for every pixel – link between every pair of pixels, p,q – similarity w ij for each link j w ij c Source:
The University of Ontario CS 4487/9687 Algorithms for Image Analysis Multi-Label Image Analysis Problems.
An Analysis of Convex Relaxations (PART I) Minimizing Higher Order Energy Functions (PART 2) Philip Torr Work in collaboration with: Pushmeet Kohli, Srikumar.
Variational Inference in Bayesian Submodular Models
Graph-Based Image Segmentation
Learning with Inference for Discrete Graphical Models Nikos Komodakis Pawan Kumar Nikos Paragios Ramin Zabih (presenter)
Robust Higher Order Potentials For Enforcing Label Consistency
ICCV Tutorial 2007 Philip Torr Papers, presentations and videos on web.....
An Analysis of Convex Relaxations M. Pawan Kumar Vladimir Kolmogorov Philip Torr for MAP Estimation.
Near-optimal Nonmyopic Value of Information in Graphical Models Andreas Krause, Carlos Guestrin Computer Science Department Carnegie Mellon University.
Sensor placement applications Monitoring of spatial phenomena Temperature Precipitation... Active learning, Experiment design Precipitation data from Pacific.
P 3 & Beyond Solving Energies with Higher Order Cliques Pushmeet Kohli Pawan Kumar Philip H. S. Torr Oxford Brookes University CVPR 2007.
Improved Moves for Truncated Convex Models M. Pawan Kumar Philip Torr.
2010/5/171 Overview of graph cuts. 2010/5/172 Outline Introduction S-t Graph cuts Extension to multi-label problems Compare simulated annealing and alpha-
Stereo & Iterative Graph-Cuts Alex Rav-Acha Vision Course Hebrew University.
Efficiently Solving Convex Relaxations M. Pawan Kumar University of Oxford for MAP Estimation Philip Torr Oxford Brookes University.
Stereo Computation using Iterative Graph-Cuts
What Energy Functions Can be Minimized Using Graph Cuts? Shai Bagon Advanced Topics in Computer Vision June 2010.
Relaxations and Moves for MAP Estimation in MRFs M. Pawan Kumar STANFORDSTANFORD Vladimir KolmogorovPhilip TorrDaphne Koller.
Measuring Uncertainty in Graph Cut Solutions Pushmeet Kohli Philip H.S. Torr Department of Computing Oxford Brookes University.
Graph-Cut Algorithm with Application to Computer Vision Presented by Yongsub Lim Applied Algorithm Laboratory.
Computer vision: models, learning and inference
Efficiently handling discrete structure in machine learning Stefanie Jegelka MADALGO summer school.
Extensions of submodularity and their application in computer vision
Multiplicative Bounds for Metric Labeling M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Phil.
Submodularity in Machine Learning
Graph Cut & Energy Minimization
Minimizing Sparse Higher Order Energy Functions of Discrete Variables (CVPR’09) Namju Kwak Applied Algorithm Lab. Computer Science Department KAIST 1Namju.
MRFs and Segmentation with Graph Cuts Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/24/10.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
Minimizing general submodular functions
Multiplicative Bounds for Metric Labeling M. Pawan Kumar École Centrale Paris Joint work with Phil Torr, Daphne Koller.
Discrete Optimization Lecture 2 – Part I M. Pawan Kumar Slides available online
Algorithms for MAP estimation in Markov Random Fields Vladimir Kolmogorov University College London.
Discrete Optimization in Computer Vision M. Pawan Kumar Slides will be available online
1 Markov Random Fields with Efficient Approximations Yuri Boykov, Olga Veksler, Ramin Zabih Computer Science Department CORNELL UNIVERSITY.
Machine Learning – Lecture 15
Approximate Inference: Decomposition Methods with Applications to Computer Vision Kyomin Jung ( KAIST ) Joint work with Pushmeet Kohli (Microsoft Research)
5 Maximizing submodular functions Minimizing convex functions: Polynomial time solvable! Minimizing submodular functions: Polynomial time solvable!
Presenter : Kuang-Jui Hsu Date : 2011/3/24(Thur.).
Gaussian Mixture Models and Expectation-Maximization Algorithm.
Machine Learning – Lecture 15
Inference for Learning Belief Propagation. So far... Exact methods for submodular energies Approximations for non-submodular energies Move-making ( N_Variables.
Submodular set functions Set function z on V is called submodular if For all A,B µ V: z(A)+z(B) ¸ z(A[B)+z(AÅB) Equivalent diminishing returns characterization:
Pushmeet Kohli. E(X) E: {0,1} n → R 0 → fg 1 → bg Image (D) n = number of pixels [Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother, Kolmogorov and.
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
Graph Algorithms for Vision Amy Gale November 5, 2002.
Tightening LP Relaxations for MAP using Message-Passing David Sontag Joint work with Talya Meltzer, Amir Globerson, Tommi Jaakkola, and Yair Weiss.
MAP Estimation in Binary MRFs using Bipartite Multi-Cuts Sashank J. Reddi Sunita Sarawagi Sundar Vishwanathan Indian Institute of Technology, Bombay TexPoint.
Markov Random Fields in Vision
Approximation Algorithms based on linear programming.
Submodularity Reading Group Matroids, Submodular Functions M. Pawan Kumar
Markov Random Fields Tomer Michaeli Graduate Course
Monitoring rivers and lakes [IJCAI ‘07]
Near-optimal Observation Selection using Submodular Functions
Markov Random Fields with Efficient Approximations
Distributed Submodular Maximization in Massive Datasets
Example: Feature selection Given random variables Y, X1, … Xn Want to predict Y from subset XA = (Xi1,…,Xik) Want k most informative features: A*
Efficient Graph Cut Optimization for Full CRFs with Quantized Edges
Tractable MAP Problems
Presentation transcript:

Tractable Higher Order Models in Computer Vision (Part II) Slides from Carsten Rother, Sebastian Nowozin, Pusohmeet Khli Microsoft Research Cambridge Presented by Xiaodan Liang

Part II Submodularity Move making algorithms Higher-order model : P n Potts model

Feature selection

Factoring distributions Problem inherently combinatorial!

Example: Greedy algorithm for feature selection

6 s Key property: Diminishing returns Selection A = {} Selection B = {X 2,X 3 } Adding X 1 will help a lot! Adding X 1 doesn’t help much New feature X 1 B A s + + Large improvement Small improvement Submodularity: Y “Sick” X 1 “Fever” X 2 “Rash” X 3 “Male” Y “Sick” Theorem [Krause, Guestrin UAI ‘05] : Information gain F(A) in Naïve Bayes models is submodular!

7 Why is submodularity useful? Theorem [Nemhauser et al ‘78] Greedy maximization algorithm returns A greedy : F(A greedy ) ¸ (1-1/e) max |A| k F(A) Greedy algorithm gives near-optimal solution! For info-gain: Guarantees best possible unless P = NP! [Krause, Guestrin UAI ’05] ~63%

8 Submodularity in Machine Learning Many ML problems are submodular, i.e., for F submodular require: Minimization: A* = argmin F(A) – Structure learning (A* = argmin I(X A ; X V \ A )) – Clustering – MAP inference in Markov Random Fields –…–… Maximization: A* = argmax F(A) – Feature selection – Active learning – Ranking –…–…

Set functions

Submodular set functions Set function F on V is called submodular if Equivalent diminishing returns characterization: S B A S + + Large improvement Small improvement Submodularity: B A A [ B AÅBAÅB + + ¸

Submodularity and supermodularity

Example: Mutual information

13 Closedness properties F 1,…,F m submodular functions on V and 1,…, m > 0 Then: F(A) =  i i F i (A) is submodular! Submodularity closed under nonnegative linear combinations! Extremely useful fact!! – F  (A) submodular )   P(  ) F  (A) submodular! – Multicriterion optimization: F 1,…,F m submodular, i ¸ 0 )  i i F i (A) submodular

14 Submodularity and Concavity |A| g(|A|)

15 Maximum of submodular functions Suppose F 1 (A) and F 2 (A) submodular. Is F(A) = max(F 1 (A),F 2 (A)) submodular? |A| F 2 (A) F 1 (A) F(A) = max(F 1 (A),F 2 (A)) max(F 1,F 2 ) not submodular in general!

16 Minimum of submodular functions Well, maybe F(A) = min(F 1 (A),F 2 (A)) instead? F 1 (A)F 2 (A)F(A) ; 000 {a}100 {b}010 {a,b}111 F({b}) – F( ; )=0 F({a,b}) – F({a})=1 < But stay tuned min(F 1,F 2 ) not submodular in general!

17 Duality For F submodular on V let G(A) = F(V) – F(V \ A) G is supermodular and called dual to F Details about properties in [Fujishige ’91] |A| F(A) |A| G(A)

18 Submodularity and convexity

19 The submodular polyhedron P F Example: V = {a,b} x({a}) · F({a}) x({b}) · F({b}) x({a,b}) · F({a,b}) PFPF x {a} x {b} AF(A) ; 0 {a} {b}2 {a,b}0

Lovasz extension

22 w {a} w {b} Example: Lovasz extension g([0,1]) = [0,1] T [-2,2] = 2 = F({b}) g([1,1]) = [1,1] T [-1,1] = 0 = F({a,b}) {}{a} {b}{a,b} [-1,1] [-2,2] g(w) = max {w T x: x 2 P F } w=[0,1] want g(w) Greedy ordering: e 1 = b, e 2 = a  w(e 1 )=1 > w(e 2 )=0 x w (e 1 )=F({b})-F( ; )=2 x w (e 2 )=F({b,a})-F({b})=-2  x w =[-2,2] AF(A) ; 0 {a} {b}2 {a,b}0

23 Why is this useful? Theorem [Lovasz ’83]: g(w) attains its minimum in [0,1] n at a corner! If we can minimize g on [0,1] n, can minimize F… (at corners, g and F take same values) F(A) submodular g(w) convex (and efficient to evaluate) Does the converse also hold? No, consider g(w 1,w 2,w 3 ) = max(w 1,w 2 +w 3 ) {a}{b}{c} F({a,b})-F({a})=0 < F({a,b,c})-F({a,c})=1

Minimizing a submodular function Ellipsoid algorithm Interior Points algorithm

Example: Image denoising

26 Example: Image denoising X1X1 X4X4 X7X7 X2X2 X5X5 X8X8 X3X3 X6X6 X9X9 Y1Y1 Y4Y4 Y7Y7 Y2Y2 Y5Y5 Y8Y8 Y3Y3 Y6Y6 Y9Y9 P(x 1,…,x n,y 1,…,y n ) =  i,j  i,j (y i,y j )  i  i (x i,y i ) Want argmax y P(y | x) =argmax y log P(x,y) =argmin y  i,j E i,j (y i,y j )+  i E i (y i ) When is this MAP inference efficiently solvable (in high treewidth graphical models)? E i,j (y i,y j ) = -log  i,j (y i,y j ) Pairwise Markov Random Field X i : noisy pixels Y i : “true” pixels

MAP inference in Markov Random Fields [Kolmogorov et al, PAMI ’04, see also: Hammer, Ops Res ‘65]

28 Constrained minimization

Part II Submodularity Move making algorithms Higher-order model : P n Potts model

Multi-Label problems

Move making expansions move and swap move for this problem

Metric and Semi metric Potential functions

if the pairwise potential functions define a metric then the energy function in equation (8) can be approximately minimized using alpha expansions. if pairwise potential functions defines a semi- metric, it can be minimized using alpha beta- swaps.

Move Energy Each move: A transformation function: The energy of a move t: The optimal move: Submodular set functions play an important role in energy minimization as they can be minimized in polynomial time

The swap move algorithm

The expansion move algorithm

Higher order potential The class of higher order clique potentials for which the expansion and swap moves can be computed in polynomial time The clique potential take the form:

Question you should be asking: Show that move energy is submodular for all x c Can my higher order potential be solved using α-expansions?

Form of the Higher Order Potentials Moves for Higher Order Potentials Clique Inconsistency function: Pairwise potential: xixi xjxj xkxk xmxm xlxl c Sum Form Max Form

Theoretical Results: Swap Move energy is always submodular if non-decreasing concave. proofs

Condition for Swap move Concave Function:

Prove all projections on two variables of any alpha beta-swap move energy are submodular. The cost of any configuration

substitute Constraints 1: Lema 1: Constraints2:

Condition for alpha expansion Metric:

Form of the Higher Order Potentials Moves for Higher Order Potentials Clique Inconsistency function: Pairwise potential: xixi xjxj xkxk xmxm xlxl c Sum Form Max Form

Part II Submodularity Move making algorithms Higher-order model : P n Potts model

Image Segmentation E(X) = ∑ c i x i + ∑ d ij |x i -x j | ii,j E: {0,1} n → R 0 → fg, 1 → bg n = number of pixels [Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rother et al.`04] Image Unary Cost Segmentation

P n Potts Potentials Patch Dictionary (Tree) C max  0 { 0 if x i = 0, i p C max otherwise h(X p ) = p [slide credits: Kohli]

P n Potts Potentials E(X) = ∑ c i x i + ∑ d ij |x i -x j | + ∑ h p (X p ) ii,j p p { 0 if x i = 0, i p C max otherwise h(X p ) = E: {0,1} n → R 0 → fg, 1 → bg n = number of pixels [slide credits: Kohli]

Theoretical Results: Expansion Move energy is always submodular if increasing linear See paper for proofs

P N Potts Model c

c Cost : 

P N Potts Model c Cost :  max

Optimal moves for P N Potts Computing the optimal swap move c Label 3 Label 4 Case 1 Not all variables assigned label 1 or 2 Move Energy is independent of t c and can be ignored. Label 1  Label 2 

Optimal moves for P N Potts Computing the optimal swap move c Label 1  Label 2  Label 3 Label 4 Case 2 All variables assigned label 1 or 2

Optimal moves for P N Potts Computing the optimal swap move c Label 3 Label 4 Case 2 All variables assigned label 1 or 2 Can be minimized by solving a st- mincut problem Label 1  Label 2 

Solving the Move Energy Add a constant This transformation does not effect the solution add a constant K to all possible values of the clique potential without changing the optimal move

Solving the Move Energy Computing the optimal swap move Source Sink v1v1 v2v2 vnvn MsMs MtMt t i = 0 v i  Source Set t j = 1 v j  Sink Set

Solving the Move Energy Computing the optimal swap move Case 1: all x i =  (v i  Source ) Cost: Source Sink v1v1 v2v2 vnvn MsMs MtMt

Solving the Move Energy Computing the optimal swap move v1v1 v2v2 vnvn MsMs MtMt Cost: Source Sink Case 2: all x i =  (v i  Sink )

Solving the Move Energy Computing the optimal swap move Cost: v1v1 v2v2 vnvn MsMs MtMt Source Sink Case 3: all x i =  (v i  Source, Sink ) Recall that the cost of an st-mincut is the sum of weights of the edges included in the stmincut which go from the source set to the sink set.

Optimal moves for P N Potts The expansion move energy Similar graph construction.

Experimental Results Texture Segmentation Unary (Colour) Pairwise (Smoothness) Higher Order (Texture) Original PairwiseHigher order

Experimental Results OriginalSwap (3.2 sec) Expansion (2.5 sec) PairwiseHigher Order Swap (4.2 sec) Expansion (3.0 sec)

Experimental Results Original PairwiseHigher Order Swap (4.7 sec) Expansion (3.7sec) Swap (5.0 sec) Expansion (4.4 sec)

More Higher-order models