# Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

## Presentation on theme: "Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online"— Presentation transcript:

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar pawan.kumar@ecp.fr Slides available online http://cvn.ecp.fr/personnel/pawan/

Recap

Loopy Belief Propagation Initialize all messages to 1 In some order of edges, update messages M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Until Convergence Rate of changes in messages < threshold Not Guaranteed !!

Loopy Belief Propagation B’ ab (i,j) = Normalize to compute beliefs B a (i), B ab (i,j) B’ a (i) = ψ a (l i )ψ b (l j )ψ ab (l i,l j )Π n≠b M na;i Π n≠a M nb;j ψ a (l i )Π n M na;i At convergence Σ j B ab (i,j) = B a (i)

Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation Yedidia, Freeman and Weiss, 2000

Exponential Family P(v) = exp{-Σ a Σ i θ a;i I a;i (v a ) -Σ a,b Σ i,k θ ab;ik I ab;ik (v a,v b ) - A(θ)} A(θ) : log Z Probability P(v) = Π a ψ a (v a ) Π (a,b) ψ ab (v a,v b ) Z ψ a (l i ) : exp(-θ a (i))ψ a (l i,l k ) : exp(-θ ab (i,k))

Exponential Family P(v) = exp{-Σ a Σ i θ a;i I a;i (v a ) -Σ a,b Σ i,k θ ab;ik I ab;ik (v a,v b ) - A(θ)} A(θ) : log Z Probability P(v) = Π a ψ a (v a ) Π (a,b) ψ ab (v a,v b ) Z ψ a (l i ) : exp(-θ a (i))ψ a (l i,l k ) : exp(-θ ab (i,k)) Energy Q(v) = Σ a θ a (v a ) + Σ a,b θ ab (v a,v b ) exp(-Q(v)) Z =

Exponential Family Probability P(v) = Π a ψ a (v a ) Π (a,b) ψ ab (v a,v b ) Z exp(-Q(v)) Z = Approximate probability distribution B(v) Minimize KL divergence between B(v) and P(v) B(v) has a simpler form than P(v)

Kullback-Leibler Divergence D = B(v) P(v) Σ v B(v) log

Kullback-Leibler Divergence D =Σ v B(v) log B(v) - Σ v B(v) log P(v)

Kullback-Leibler Divergence D =Σ v B(v) log B(v) + Σ v B(v) Q(v) - (- log Z) Helmholz free energy Constant with respect to B

Kullback-Leibler Divergence Σ v B(v) log B(v) + Σ v B(v) Q(v) Negative Entropy U(B)

Kullback-Leibler Divergence Σ v B(v) log B(v) + Σ v B(v) Q(v) Average Energy S(B)

Kullback-Leibler Divergence Σ v B(v) log B(v) + Σ v B(v) Q(v) Gibbs free energy

Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation

Simpler Distribution One-node marginals B a (i) Joint probability B(v) = Π a B a (v a )

Average Energy Σ v B(v) Q(v)

Average Energy Σ v B(v) (Σ a θ a (v a ) + Σ a,b θ ab (v a,v b )) * = Simplify on board !!! *

Average Energy Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B a (i)B b (k)θ ab (i,k)

Negative Entropy Σ v B(v) log (B(v)) *

Negative Entropy Σ a Σ i B a (i)log(B a (i))

Mean-Field Free Energy Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B a (i)B b (k)θ ab (i,k) + Σ a Σ i B a (i)log(B a (i))

Optimization Problem Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B a (i)B b (k)θ ab (i,k) + Σ a Σ i B a (i)log(B a (i)) min B Σ i B a (i) = 1s.t. *

KKT Condition log(B a (i)) = -θ a (i) -Σ b Σ k B b (k)θ ab (i,k) + λ a -1 B a (i) = exp(-θ a (i) -Σ b Σ k B b (k)θ ab (i,k))/Z a

Optimization Initialize B a (random, uniform, domain knowledge) B a (i) = exp(-θ a (i) -Σ b Σ k B b (k)θ ab (i,k))/Z a Set all random variables to unprocessed Pick an unprocessed random variable V a If B a changes, set neighbors to unprocessed Until Convergence Guaranteed !! Tutorial: Jaakkola, 2000 (one of several)

Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation

Simpler Distribution One-node marginals B a (i) Two-node marginals B ab (i,k) Joint probability hard to write down But not for trees

Simpler Distribution One-node marginals B a (i) Two-node marginals B ab (i,k) B(v) = Π a,b B ab (v a,v b ) Π a B a (v a ) n(a)-1 Pearl, 1988 n(a) = number of neighbors of V a

Average Energy Σ v B(v) Q(v)

Average Energy Σ v B(v) (Σ a θ a (v a ) + Σ a,b θ ab (v a,v b )) *

Average Energy Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B ab (i,k)θ ab (i,k) *

Average Energy -Σ a (n(a)-1)Σ i B a (i)θ a (i) + Σ a,b Σ i,k B ab (i,k)(θ a (i)+θ b (k)+θ ab (i,k)) n(a) = number of neighbors of V a

Negative Entropy Σ v B(v) log (B(v)) *

Negative Entropy -Σ a (n(a)-1)Σ i B a (i)log(B a (i)) + Σ a,b Σ i,k B ab (i,k)log(B ab (i,k)) Exact for tree Approximate for general MRF

Bethe Free Energy -Σ a (n(a)-1)Σ i B a (i)(θ a (i)+log(B a (i))) + Σ a,b Σ i,k B ab (i,k)(θ a (i)+θ b (k)+θ ab (i,k)+log(B ab (i,k)) Exact for tree Approximate for general MRF

Optimization Problem -Σ a (n(a)-1)Σ i B a (i)(θ a (i)+log(B a (i)))min B Σ k B ab (i,k) = B a (i) Σ i,k B ab (i,k) = 1 Σ i B a (i) = 1 s.t. * + Σ a,b Σ i,k B ab (i,k)(θ a (i)+θ b (k)+θ ab (i,k)+log(B ab (i,k))

KKT Condition log(B ab (i,k)) = -(θ a (i)+θ b (k)+θ ab (i,k)) + λ ab (k) + λ ba (i) + μ ab - 1 λ ab (k) = log(M ab;k )

Optimization BP tries to optimize Bethe free energy But it may not converge Convergent alternatives exist Yuille and Rangarajan, 2003

Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation

Local Free Energy V3V3 V4V4 V1V1 V2V2 Cluster of variables c G c = Σ v c B c (v c )(log(B c (v c )) + Σ d “subset of c” θ d (v d )) G 12 = Σ v 1,v 2 B 12 (v 1,v 2 )(log(B 12 (v 1,v 2 )) + θ 1 (v 1 ) + θ 2 (v 2 ) + θ 12 (v 1,v 2 ))

Local Free Energy V3V3 V4V4 V1V1 V2V2 Cluster of variables c G c = Σ v c B c (v c )(log(B c (v c )) + Σ d “subset of c” θ d (v d )) G 1 = Σ v 1 B 1 (v 1 )(log(B 1 (v 1 )) + θ 1 (v 1 ))

Local Free Energy V3V3 V4V4 V1V1 V2V2 Cluster of variables c G c = Σ v c B c (v c )(log(B c (v c )) + Σ d “subset of c” θ d (v d )) G 12 = Σ v 1,v 2 B 12 (v 1,v 2 )(log(B 1234 (v 1,v 2,v 3,v 4 )) + θ 1 (v 1 ) + θ 2 (v 2 ) + θ 3 (v 3 ) + θ 4 (v 4 ) + θ 12 (v 1,v 2 ) + θ 13 (v 1,v 3 ) + θ 24 (v 2,v 4 ) + θ 34 (v 3,v 4 ))

Sum of Local Free Energies V3V3 V4V4 V1V1 V2V2 G 12 + G 13 + G 24 + G 34 Overcounts G 1, G 2, G 3, G 4 once !!! Sum of free energies of all pairwise clusters

Sum of Local Free Energies V3V3 V4V4 V1V1 V2V2 G 12 + G 13 + G 24 + G 34 Sum of free energies of all pairwise clusters - G 1 - G 2 - G 3 - G 4

Sum of Local Free Energies V3V3 V4V4 V1V1 V2V2 G 12 + G 13 + G 24 + G 34 Sum of free energies of all pairwise clusters - G 1 - G 2 - G 3 - G 4 Bethe Approximation !!!

Kikuchi Approximations V3V3 V4V4 V1V1 V2V2 G 1234 Use bigger clusters

Kikuchi Approximations V4V4 V5V5 V1V1 V2V2 G 1245 + G 2356 Use bigger clusters V6V6 V3V3 - G 25 Derive message passing using KKT conditions!

Generalized Belief Propagation V4V4 V5V5 V1V1 V2V2 G 1245 + G 2356 Use bigger clusters V6V6 V3V3 - G 25 Derive message passing using KKT conditions!

Similar presentations