Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

Similar presentations


Presentation on theme: "Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online"— Presentation transcript:

1 Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online

2 Recap

3 Loopy Belief Propagation Initialize all messages to 1 In some order of edges, update messages M ab;k = Σ i ψ a (l i )ψ ab (l i,l k )Π n≠b M na;i Until Convergence Rate of changes in messages < threshold Not Guaranteed !!

4 Loopy Belief Propagation B’ ab (i,j) = Normalize to compute beliefs B a (i), B ab (i,j) B’ a (i) = ψ a (l i )ψ b (l j )ψ ab (l i,l j )Π n≠b M na;i Π n≠a M nb;j ψ a (l i )Π n M na;i At convergence Σ j B ab (i,j) = B a (i)

5 Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation Yedidia, Freeman and Weiss, 2000

6 Exponential Family P(v) = exp{-Σ a Σ i θ a;i I a;i (v a ) -Σ a,b Σ i,k θ ab;ik I ab;ik (v a,v b ) - A(θ)} A(θ) : log Z Probability P(v) = Π a ψ a (v a ) Π (a,b) ψ ab (v a,v b ) Z ψ a (l i ) : exp(-θ a (i))ψ a (l i,l k ) : exp(-θ ab (i,k))

7 Exponential Family P(v) = exp{-Σ a Σ i θ a;i I a;i (v a ) -Σ a,b Σ i,k θ ab;ik I ab;ik (v a,v b ) - A(θ)} A(θ) : log Z Probability P(v) = Π a ψ a (v a ) Π (a,b) ψ ab (v a,v b ) Z ψ a (l i ) : exp(-θ a (i))ψ a (l i,l k ) : exp(-θ ab (i,k)) Energy Q(v) = Σ a θ a (v a ) + Σ a,b θ ab (v a,v b ) exp(-Q(v)) Z =

8 Exponential Family Probability P(v) = Π a ψ a (v a ) Π (a,b) ψ ab (v a,v b ) Z exp(-Q(v)) Z = Approximate probability distribution B(v) Minimize KL divergence between B(v) and P(v) B(v) has a simpler form than P(v)

9 Kullback-Leibler Divergence D = B(v) P(v) Σ v B(v) log

10 Kullback-Leibler Divergence D =Σ v B(v) log B(v) - Σ v B(v) log P(v)

11 Kullback-Leibler Divergence D =Σ v B(v) log B(v) + Σ v B(v) Q(v) - (- log Z) Helmholz free energy Constant with respect to B

12 Kullback-Leibler Divergence Σ v B(v) log B(v) + Σ v B(v) Q(v) Negative Entropy U(B)

13 Kullback-Leibler Divergence Σ v B(v) log B(v) + Σ v B(v) Q(v) Average Energy S(B)

14 Kullback-Leibler Divergence Σ v B(v) log B(v) + Σ v B(v) Q(v) Gibbs free energy

15 Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation

16 Simpler Distribution One-node marginals B a (i) Joint probability B(v) = Π a B a (v a )

17 Average Energy Σ v B(v) Q(v)

18 Average Energy Σ v B(v) (Σ a θ a (v a ) + Σ a,b θ ab (v a,v b )) * = Simplify on board !!! *

19 Average Energy Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B a (i)B b (k)θ ab (i,k)

20 Negative Entropy Σ v B(v) log (B(v)) *

21 Negative Entropy Σ a Σ i B a (i)log(B a (i))

22 Mean-Field Free Energy Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B a (i)B b (k)θ ab (i,k) + Σ a Σ i B a (i)log(B a (i))

23 Optimization Problem Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B a (i)B b (k)θ ab (i,k) + Σ a Σ i B a (i)log(B a (i)) min B Σ i B a (i) = 1s.t. *

24 KKT Condition log(B a (i)) = -θ a (i) -Σ b Σ k B b (k)θ ab (i,k) + λ a -1 B a (i) = exp(-θ a (i) -Σ b Σ k B b (k)θ ab (i,k))/Z a

25 Optimization Initialize B a (random, uniform, domain knowledge) B a (i) = exp(-θ a (i) -Σ b Σ k B b (k)θ ab (i,k))/Z a Set all random variables to unprocessed Pick an unprocessed random variable V a If B a changes, set neighbors to unprocessed Until Convergence Guaranteed !! Tutorial: Jaakkola, 2000 (one of several)

26 Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation

27 Simpler Distribution One-node marginals B a (i) Two-node marginals B ab (i,k) Joint probability hard to write down But not for trees

28 Simpler Distribution One-node marginals B a (i) Two-node marginals B ab (i,k) B(v) = Π a,b B ab (v a,v b ) Π a B a (v a ) n(a)-1 Pearl, 1988 n(a) = number of neighbors of V a

29 Average Energy Σ v B(v) Q(v)

30 Average Energy Σ v B(v) (Σ a θ a (v a ) + Σ a,b θ ab (v a,v b )) *

31 Average Energy Σ a Σ i B a (i)θ a (i) + Σ a,b Σ i,k B ab (i,k)θ ab (i,k) *

32 Average Energy -Σ a (n(a)-1)Σ i B a (i)θ a (i) + Σ a,b Σ i,k B ab (i,k)(θ a (i)+θ b (k)+θ ab (i,k)) n(a) = number of neighbors of V a

33 Negative Entropy Σ v B(v) log (B(v)) *

34 Negative Entropy -Σ a (n(a)-1)Σ i B a (i)log(B a (i)) + Σ a,b Σ i,k B ab (i,k)log(B ab (i,k)) Exact for tree Approximate for general MRF

35 Bethe Free Energy -Σ a (n(a)-1)Σ i B a (i)(θ a (i)+log(B a (i))) + Σ a,b Σ i,k B ab (i,k)(θ a (i)+θ b (k)+θ ab (i,k)+log(B ab (i,k)) Exact for tree Approximate for general MRF

36 Optimization Problem -Σ a (n(a)-1)Σ i B a (i)(θ a (i)+log(B a (i)))min B Σ k B ab (i,k) = B a (i) Σ i,k B ab (i,k) = 1 Σ i B a (i) = 1 s.t. * + Σ a,b Σ i,k B ab (i,k)(θ a (i)+θ b (k)+θ ab (i,k)+log(B ab (i,k))

37 KKT Condition log(B ab (i,k)) = -(θ a (i)+θ b (k)+θ ab (i,k)) + λ ab (k) + λ ba (i) + μ ab - 1 λ ab (k) = log(M ab;k )

38 Optimization BP tries to optimize Bethe free energy But it may not converge Convergent alternatives exist Yuille and Rangarajan, 2003

39 Outline Free Energy Mean-Field Approximation Bethe Approximation Kikuchi Approximation

40 Local Free Energy V3V3 V4V4 V1V1 V2V2 Cluster of variables c G c = Σ v c B c (v c )(log(B c (v c )) + Σ d “subset of c” θ d (v d )) G 12 = Σ v 1,v 2 B 12 (v 1,v 2 )(log(B 12 (v 1,v 2 )) + θ 1 (v 1 ) + θ 2 (v 2 ) + θ 12 (v 1,v 2 ))

41 Local Free Energy V3V3 V4V4 V1V1 V2V2 Cluster of variables c G c = Σ v c B c (v c )(log(B c (v c )) + Σ d “subset of c” θ d (v d )) G 1 = Σ v 1 B 1 (v 1 )(log(B 1 (v 1 )) + θ 1 (v 1 ))

42 Local Free Energy V3V3 V4V4 V1V1 V2V2 Cluster of variables c G c = Σ v c B c (v c )(log(B c (v c )) + Σ d “subset of c” θ d (v d )) G 12 = Σ v 1,v 2 B 12 (v 1,v 2 )(log(B 1234 (v 1,v 2,v 3,v 4 )) + θ 1 (v 1 ) + θ 2 (v 2 ) + θ 3 (v 3 ) + θ 4 (v 4 ) + θ 12 (v 1,v 2 ) + θ 13 (v 1,v 3 ) + θ 24 (v 2,v 4 ) + θ 34 (v 3,v 4 ))

43 Sum of Local Free Energies V3V3 V4V4 V1V1 V2V2 G 12 + G 13 + G 24 + G 34 Overcounts G 1, G 2, G 3, G 4 once !!! Sum of free energies of all pairwise clusters

44 Sum of Local Free Energies V3V3 V4V4 V1V1 V2V2 G 12 + G 13 + G 24 + G 34 Sum of free energies of all pairwise clusters - G 1 - G 2 - G 3 - G 4

45 Sum of Local Free Energies V3V3 V4V4 V1V1 V2V2 G 12 + G 13 + G 24 + G 34 Sum of free energies of all pairwise clusters - G 1 - G 2 - G 3 - G 4 Bethe Approximation !!!

46 Kikuchi Approximations V3V3 V4V4 V1V1 V2V2 G 1234 Use bigger clusters

47 Kikuchi Approximations V4V4 V5V5 V1V1 V2V2 G G 2356 Use bigger clusters V6V6 V3V3 - G 25 Derive message passing using KKT conditions!

48 Generalized Belief Propagation V4V4 V5V5 V1V1 V2V2 G G 2356 Use bigger clusters V6V6 V3V3 - G 25 Derive message passing using KKT conditions!


Download ppt "Discrete Optimization Lecture 4 – Part 3 M. Pawan Kumar Slides available online"

Similar presentations


Ads by Google