Variational Inference and Variational Message Passing

Name: Variational Inference and Variational Message Passing
Uploaded: 2017-08-20T14:46:57+00:00
Duration: PTM16S44
Description: Variational Inference and Variational Message Passing

Variational Inference and Variational Message Passing
John Winn Microsoft Research, Cambridge 12th November Robotics Research Group, University of Oxford

Overview Probabilistic models & Bayesian inference
Variational Inference Variational Message Passing Vision example

Bayesian networks C S L I P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S)
Lighting colour Surface colour Image colour Object class C S L I Directed graph P(L) P(C) P(S|C) P(I|L,S) Nodes represent variables Links show dependencies Conditional distributions at each node Defines a joint distribution: P(C,L,S,I)=P(L) P(C) P(S|C) P(I|L,S)

Bayesian inference C L S I P(H1, H2…| V) Observed variables V and
Object class C Observed variables V and hidden variables H. Hidden Surface colour Lighting colour Hidden variables include parameters and latent variables. L S Learning/inference involves finding: Image colour I P(H1, H2…| V) Observed

Bayesian inference vs. ML/MAP
Consider learning one parameter θ Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison How should we represent this posterior distribution?

Consider learning one parameter θ Maximum of P(V| θ) P(θ) P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ θMAP

Consider learning one parameter θ High probability density High probability mass P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ θMAP

Consider learning one parameter θ P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ θML Samples

Consider learning one parameter θ P(V| θ) P(θ) Illustration ML vs. Bayesian – for Bayesian methods, mention sampling WRITE CONCLUSION SLIDE!! Maximum likelihood/MAP Finds point estimates of hidden variables Vulnerable to over-fitting Variational inference Finds posterior distributions over hidden variables Allows direct model comparison θ Variational approximation θML

Variational Inference
(in three easy steps…) Choose a family of variational distributions Q(H). Use Kullback-Leibler divergence KL(Q||P) as a measure of ‘distance’ between P(H|V) and Q(H). Find Q which minimises divergence.

KL Divergence Q Exclusive Minimising KL(Q||P) P Inclusive Minimising
KL(P||Q) P Q

Minimising the KL divergence
For arbitrary Q(H) fixed maximise minimise where We choose a family of Q distributions where L(Q) is tractable to compute.

Minimising the KL divergence
KL(Q || P) maximise ln P(V) fixed L(Q)

Factorised Approximation
Assume Q factorises No further assumptions are required! Optimal solution for one factor given by Guarantees to increase the lower bound – unless already at a maximum.

Example: Univariate Gaussian
Likelihood function Conjugate prior Factorized variational distribution

Initial Configuration
γ μ

After Updating Q(μ) γ μ

After Updating Q(γ) γ μ

Converged Solution γ μ

Lower Bound for GMM

Variational Equations for GMM

Variational Message Passing
VMP makes it easier and quicker to apply factorised variational inference. VMP carries out variational inference using local computations and message passing on the graphical model. Modular algorithm allows modifying, extending or combining models.

Local Updates For factorised Q, update for each factor depends only on the Markov blanket: Updates can be carried out locally at each node.

VMP I: The Exponential Family
Conditional distributions expressed in exponential family form. ln P ( X | θ ) = θ T u ( X ) + g ( θ ) + f ( X ) ‘natural’ parameter vector sufficient statistics vector 2 1 For example, the Gaussian distribution ln / ) , | ( T + - ú û ù ê ë é = gm p g mg m X P

VMP II: Conjugacy Parents and children are chosen to be conjugate i.e. same functional form X Y ) ( | ln T X f g P + = θ u same ) , ( ' | ln T Z Y f X g P + = u φ Z Examples: Gaussian for the mean of a Gaussian Gamma for the precision of a Gaussian Dirichlet for the parameters of a discrete distribution

VMP III: Messages Conditionals Messages X Y Z ) ( | ln X f g P + = θ u
, ( ' | ln T Z Y f X g P + = u φ Messages Parent to child (X→Z) X Y Child to parent (Z→X) Z

Computed from messages from parents
VMP IV: Update Optimal Q(X) has same form as P(X|θ) but with updated parameter vector θ* Computed from messages from parents These messages can also be used to calculate the bound on the evidence L(Q) – see Winn & Bishop, 2004.

precision (inverse variance)
VMP Example Learning parameters of a Gaussian from N data points. μ γ mean precision (inverse variance) x data N

VMP Example Message from γ to all x. μ γ need initial Q(γ) x N

μ VMP Example γ x Messages from each xn to μ. N Update Q(μ)
parameter vector

VMP Example Message from updated μ to all x. μ γ x N

μ VMP Example γ x Messages from each xn to γ. N Update Q(γ)
parameter vector

Features of VMP Graph does not need to be a tree – it can contain loops (but not cycles). Flexible message passing schedule – factors can be updated in any order. Distributions can be discrete or continuous, multivariate, truncated (e.g. rectified Gaussian). Can have deterministic relationships (A=B+C). Allows for point estimates e.g. of hyper-parameters

VMP Software: VIBES Free download from vibes.sourceforge.net

Flexible sprite model x N Proposed by Jojic & Frey (2001)
Set of images e.g. frames from a video x N

Sprite appearance and shape
Flexible sprite model f π Sprite appearance and shape x N

Flexible sprite model f π T m x N
Sprite transform for this image (discretised) T m x Mask for this image N

Flexible sprite model b f π Background T m Noise β x N

VMP b f π T m β x N Winn & Blake (NIPS 2004)

Results of VMP on hand video
Original video Learned appearance and mask Learned transforms (i.e. motion)

Conclusions Variational Message Passing allows approximate Bayesian inference for a wide range of models. VMP simplifies the construction, testing, extension and comparison of models. You can try VMP for yourself vibes.sourceforge.net

That’s all folks!

Variational Inference and Variational Message Passing

Similar presentations

Presentation on theme: "Variational Inference and Variational Message Passing"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Variational Inference and Variational Message Passing

Similar presentations

Presentation on theme: "Variational Inference and Variational Message Passing"— Presentation transcript:

Similar presentations

About project

Feedback