Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variational Inference and Variational Message Passing

Similar presentations


Presentation on theme: "Variational Inference and Variational Message Passing"— Presentation transcript:

1 Variational Inference and Variational Message Passing
John Winn Microsoft Research, Cambridge 12th November Robotics Research Group, University of Oxford

2 Overview Probabilistic models & Bayesian inference
Variational Inference Variational Message Passing Vision example

3 Overview Probabilistic models & Bayesian inference
Variational Inference Variational Message Passing Vision example

4 Bayesian networks C S L I P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S)
Lighting colour Surface colour Image colour Object class C S L I Directed graph P(L) P(C) P(S|C) P(I|L,S) Nodes represent variables Links show dependencies Conditional distributions at each node Defines a joint distribution: P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S)

5 Bayesian inference C L S I P(H1, H2…| V) Observed variables V and
Object class C Observed variables V and hidden variables H. Hidden Surface colour Lighting colour Hidden variables include parameters and latent variables. L S Learning/inference involves finding: Image colour I P(H1, H2…| V) Observed

6 Bayesian inference vs. ML/MAP
Consider learning one parameter θ Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison How should we represent this posterior distribution?

7 Bayesian inference vs. ML/MAP
Consider learning one parameter θ Maximum of P(V| θ) P(θ) P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ θMAP

8 Bayesian inference vs. ML/MAP
Consider learning one parameter θ High probability density High probability mass P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ θMAP

9 Bayesian inference vs. ML/MAP
Consider learning one parameter θ P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ θML Samples

10 Bayesian inference vs. ML/MAP
Consider learning one parameter θ P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ Variational approximation θML

11 Overview Probabilistic models & Bayesian inference
Variational Inference Variational Message Passing Vision example

12 Variational Inference
(in three easy steps…) Choose a family of variational distributions Q(H). Use Kullback-Leibler divergence KL(Q||P) as a measure of ‘distance’ between P(H|V) and Q(H). Find Q which minimises divergence.

13 KL Divergence Q Exclusive Minimising KL(Q||P) P Inclusive Minimising
KL(P||Q) P Q

14 Minimising the KL divergence
For arbitrary Q(H) fixed maximise minimise where We choose a family of Q distributions where L(Q) is tractable to compute.

15 Minimising the KL divergence
KL(Q || P) maximise ln P(V) fixed L(Q)

16 Minimising the KL divergence
KL(Q || P) maximise ln P(V) fixed L(Q)

17 Minimising the KL divergence
KL(Q || P) maximise ln P(V) fixed L(Q)

18 Minimising the KL divergence
KL(Q || P) maximise ln P(V) fixed L(Q)

19 Minimising the KL divergence
KL(Q || P) maximise ln P(V) fixed L(Q)

20 Factorised Approximation
Assume Q factorises No further assumptions are required! Optimal solution for one factor given by Guarantees to increase the lower bound – unless already at a maximum.

21 Example: Univariate Gaussian
Likelihood function Conjugate prior Factorized variational distribution

22 Initial Configuration
γ μ

23 After Updating Q(μ) γ μ

24 After Updating Q(γ) γ μ

25 Converged Solution γ μ

26 Lower Bound for GMM

27 Variational Equations for GMM

28 Overview Probabilistic models & Bayesian inference
Variational Inference Variational Message Passing Vision example

29 Variational Message Passing
VMP makes it easier and quicker to apply factorised variational inference. VMP carries out variational inference using local computations and message passing on the graphical model. Modular algorithm allows modifying, extending or combining models.

30 Local Updates For factorised Q, update for each factor depends only on the Markov blanket: Updates can be carried out locally at each node.

31 VMP I: The Exponential Family
Conditional distributions expressed in exponential family form. ln P ( X | θ ) = θ T u ( X ) + g ( θ ) + f ( X ) ‘natural’ parameter vector sufficient statistics vector 2 1 For example, the Gaussian distribution ln / ) , | ( T + - ú û ù ê ë é = gm p g mg m X P

32 VMP II: Conjugacy Parents and children are chosen to be conjugate i.e. same functional form X Y ) ( | ln T X f g P + = θ u same ) , ( ' | ln T Z Y f X g P + = u φ Z Examples: Gaussian for the mean of a Gaussian Gamma for the precision of a Gaussian Dirichlet for the parameters of a discrete distribution

33 VMP III: Messages Conditionals Messages X Y Z ) ( | ln X f g P + = θ u
, ( ' | ln T Z Y f X g P + = u φ Messages Parent to child (X→Z) X Y Child to parent (Z→X) Z

34 Computed from messages from parents
VMP IV: Update Optimal Q(X) has same form as P(X|θ) but with updated parameter vector θ* Computed from messages from parents These messages can also be used to calculate the bound on the evidence L(Q) – see Winn & Bishop, 2004.

35 precision (inverse variance)
VMP Example Learning parameters of a Gaussian from N data points. μ γ mean precision (inverse variance) x data N

36 VMP Example Message from γ to all x. μ γ need initial Q(γ) x N

37 μ VMP Example γ x Messages from each xn to μ. N Update Q(μ)
parameter vector

38 VMP Example Message from updated μ to all x. μ γ x N

39 μ VMP Example γ x Messages from each xn to γ. N Update Q(γ)
parameter vector

40 Features of VMP Graph does not need to be a tree – it can contain loops (but not cycles). Flexible message passing schedule – factors can be updated in any order. Distributions can be discrete or continuous, multivariate, truncated (e.g. rectified Gaussian). Can have deterministic relationships (A=B+C). Allows for point estimates e.g. of hyper-parameters

41 VMP Software: VIBES Free download from vibes.sourceforge.net

42 Overview Probabilistic models & Bayesian inference
Variational Inference Variational Message Passing Vision example

43 Flexible sprite model x N Proposed by Jojic & Frey (2001)
Set of images e.g. frames from a video x N

44 Sprite appearance and shape
Flexible sprite model f π Sprite appearance and shape x N

45 Flexible sprite model f π T m x N
Sprite transform for this image (discretised) T m x Mask for this image N

46 Flexible sprite model b f π Background T m Noise β x N

47 VMP b f π T m β x N Winn & Blake (NIPS 2004)

48 Results of VMP on hand video
Original video Learned appearance and mask Learned transforms (i.e. motion)

49 Conclusions Variational Message Passing allows approximate Bayesian inference for a wide range of models. VMP simplifies the construction, testing, extension and comparison of models. You can try VMP for yourself vibes.sourceforge.net

50 That’s all folks!


Download ppt "Variational Inference and Variational Message Passing"

Similar presentations


Ads by Google