Presentation is loading. Please wait.

Presentation is loading. Please wait.

Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101.

Similar presentations


Presentation on theme: "Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101."— Presentation transcript:

1 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

2 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 The Bayes scene Exact averaging in discrete/small models (Bayes networks) Approximate averaging: - Monte Carlo methods - Ensemble/mean field - Variational Bayes methods Variational-Bayes.org MLpedia Wikipedia ISP Bayes: ICA: mean field, Kalman, dynamical systems NeuroImaging: Optimal signal detector Approximate inference Machine learning methods

3 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bayes’ methodology Minimal error rate obtained when detector is based on posterior probability (Bayes decision theory) Likelihood may contain unknown parameters

4 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bayes’ methodology Conventional approach is to use most probable parameters However: averaged model is generalization optimal (Hansen, 1999), i.e.:

5 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 The hidden agenda of learning Typically learning proceeds by generalization from limited set of samples…but We would like to identify the model that generated the data ….Choose the least complex model compatible with data That I figured out in 1386

6 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Generalizability is defined as the expected performance on a random new sample... the mean performance of a model on a ”fresh” data set is an unbiased estimate of generalization Typical loss functions:,, etc Results can be presented as ”bias-variance trade-off curves” or ”learning curves” Generalization!

7 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Generalization optimal predictive distribution ”The game of guessing a pdf” Assume: Random teacher drawn from P(θ), random data set, D, drawn from P(x|θ) The prediction / generalization error is Predictive distribution of model ATest sample distribution

8 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Generalization optimal predictive distribution We define the ”generalization functional” (Hansen, NIPS 1999) Minimized by the ”Bayesian averaging” predictive distribution

9 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bias-variance trade-off and averaging Now averaging is good, can we average ”too much”? Define the family of tempered posterior distributions Case: univariate normal dist. w. unknown mean parameter… High temperature: widened posterior average Low temperature: Narrow average

10 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bayes’ model selection, example Let three models A,B,C be given A)x is normal N(0,1) B)x is normal N(0,σ 2 ), σ 2 is uniform U(0,∞) C)x is normal N(μ,σ 2 ), μ, σ 2 are uniform U(0,∞)

11 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Model A The likelihood of N samples is given by

12 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Model B The likelihood of N samples is given by

13 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Model C The likelihood of N samples is given by

14 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Model A –maximum likelihood The likelihood of N samples is given by

15 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Model B The likelihood of N samples is given by

16 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Model C The likelihood of N samples is given by

17 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bayesian model selection C(green) is the correct model, what if only A(red)+B(blue) are known?

18 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bayesian model selection A (red) is the correct model

19 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bayesian inference Bayesian averaging Caveats: Bayes can rarely be implemented exactly Not optimal if the model family is incorrect: ”Bayes can not detect bias” However, still asymptotically optimal if observation model is correct & prior is ”weak” (Hansen, 1999).

20 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Hierarchical Bayes models Multi-level models in Bayesian averaging C.P. Robert: The Bayesian Choice - A Decision-Theoretic Motivation. Springer Texts in Statistics, Springer Verlag, New York (1994). G. Golub, M. Heath and G. Wahba, Generalized crossvalidation as a method for choosing a good ridge parameter, Technometrics 21 pp. 215–223, (1979). K. Friston: A theory of Cortical Responses. Phil. Trans. R. Soc. B 360:815-836 (2005)

21 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Hierarchical Bayes models “ learning hyper- parameters by adjusting prior expectations” -empirical Bayes -MacKay, (1992) Hansen et al. (Eusipco, 2006) Cf. Boltzmann learning (Hinton et al. 1983) Posterior “Evidence” Prior Target at Maximal evidence

22 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Hyperparameter dynamics Gaussian prior w adaptive hyperparameter Discontinuity: Parameter is pruned at Low signal-to-noise Hansen & Rasmussen, Neural Comp (1994) Tipping “Relevance vector machine” (1999) θ 2 A is a signal-to-noise measure θ ML is maximum lik. opt.

23 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Hyperparameter dynamics Hyperparameters dynamically updated implies pruning Pruning decisions based on SNR Mechanism for cognitive selection, attention?

24 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Hansen & Rasmussen, Neural Comp (1994)

25 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

26 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

27 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

28 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

29 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

30 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

31 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

32 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

33 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

34 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

35 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Approximations needed for posteriors Approximations using asymptotic expansions (Laplace etc) -JL Approximation of posteriors using tractable (factorized) pdf’s by KL-fitting… Approximation of products using EP -AH Wednesday Approximation by MCMC –OWI Thursday

36 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 P. Højen-Sørensen: Thesis (2001) Illustration of approximation by a gaussian pdf

37 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

38 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes Notation are observables and hidden variables – we analyse the log likelihood of a mixture model

39 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes

40 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes:

41 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Conjugate exponential families

42 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Mini exercise What are the natural parameters for a Gaussian? What are the natural parameters for a MoG?

43 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

44 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Observation model and “Bayes factor”

45 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 “Normal inverse gamma” prior – the conjugate prior for the GLM observation model

46 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 “Normal inverse gamma” prior – the conjugate prior for the GLM observation model

47 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Bayes factor is the ratio between normalization const. of NIG’s:

48 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

49 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

50 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006

51 Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Exercises Matthew Beal’s Mixture of Factor Analyzers code –Code available (variational-bayes.org) Code a VB version of the BGML for signal detection –Code available for exact posterior


Download ppt "Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101."

Similar presentations


Ads by Google