Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University,

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Bayesian inference Jean Daunizeau Wellcome Trust Centre for Neuroimaging 16 / 05 / 2008.
1 -Classification: Internal Uncertainty in petroleum reservoirs.
Bayesian Belief Propagation
Introduction to Data Assimilation NCEO Data-assimilation training days 5-7 July 2010 Peter Jan van Leeuwen Data Assimilation Research Center (DARC) University.
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Non-Gaussian Data Assimilation using Mixture and Kernel Piyush Tagade Hansjorg Seybold Sai Ravela Earth Systems and Signals Group Earth Atmospheric and.
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
CS479/679 Pattern Recognition Dr. George Bebis
Pattern Recognition and Machine Learning
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Dynamic Bayesian Networks (DBNs)
Visual Recognition Tutorial
Variational Inference and Variational Message Passing
Variational Methods TCD Interests Simon Wilson. Background We are new to this area of research – so we can’t say very much about it – but we’re enthusiastic!
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
. Expressive Graphical Models in Variational Approximations: Chain-Graphs and Hidden Variables Tal El-Hay & Nir Friedman School of Computer Science & Engineering.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.
Decentralised Coordination of Mobile Sensors School of Electronics and Computer Science University of Southampton Ruben Stranders,
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
The free-energy principle: a rough guide to the brain? K Friston Summarized by Joon Shik Kim (Thu) Computational Models of Intelligence.
Nonlinear Data Assimilation and Particle Filters
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Sparse Gaussian Process Classification With Multiple Classes Matthias W. Seeger Michael I. Jordan University of California, Berkeley
Learning Lateral Connections between Hidden Units Geoffrey Hinton University of Toronto in collaboration with Kejie Bao University of Toronto.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
DEPARTMENT OF ENGINEERING SCIENCE Information, Control, and Vision Engineering Extras From Programming Lecture … And exercise solutions.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
A unifying framework for hybrid data-assimilation schemes Peter Jan van Leeuwen Data Assimilation Research Center (DARC) National Centre for Earth Observation.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Research Vignette: The TransCom3 Time-Dependent Global CO 2 Flux Inversion … and More David F. Baker NCAR 12 July 2007 David F. Baker NCAR 12 July 2007.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
-Arnaud Doucet, Nando de Freitas et al, UAI
Data assimilation and forecasting the weather (!) Eugenia Kalnay and many friends University of Maryland.
Quality of model and Error Analysis in Variational Data Assimilation François-Xavier LE DIMET Victor SHUTYAEV Université Joseph Fourier+INRIA Projet IDOPT,
Variational data assimilation: examination of results obtained by different combinations of numerical algorithms and splitting procedures Zahari Zlatev.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Can small quantum systems learn? NATHAN WIEBE & CHRISTOPHER GRANADE, DEC
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
École Doctorale des Sciences de l'Environnement d’ Î le-de-France Année Modélisation Numérique de l’Écoulement Atmosphérique et Assimilation.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CSC2535: Computation in Neural Networks Lecture 7: Independent Components Analysis Geoffrey Hinton.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
An Introduction to AD Model Builder PFRP
Learning Deep Generative Models by Ruslan Salakhutdinov
Probability Theory and Parameter Estimation I
Variational filtering in generated coordinates of motion
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Computer vision: models, learning and inference
Probabilistic Models for Linear Regression
Bayesian Models in Machine Learning
Expectation-Propagation performs smooth gradient descent Advances in Approximate Bayesian Inference 2016 Guillaume Dehaene I’d like to thank the organizers.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
The free-energy principle: a rough guide to the brain? K Friston
Expectation-Maximization & Belief Propagation
Classical regression review
Optimization under Uncertainty
Presentation transcript:

Mean Field Variational Bayesian Data Assimilation EGU 2012, Vienna Michail Vrettas 1, Dan Cornford 1, Manfred Opper 2 1 NCRG, Computer Science, Aston University, UK 2 Technical University of Berlin, Germany

Why do data assimilation? Aim of data assimilation is to estimate the posterior distribution of the state of a dynamical model (X) given observations (Y) – may also be interested in parameter (θ) inference

Framing the problem Need to consider: – model equations and model error: – observation and representativity error: Take a Bayesian approach: This is an inference problem – exact solutions are hard; analytical impossible – approximations: MCMC, SMC, sequential (EnKF)

Our solution Assume the model can be represented by a SDE (diffusion process): We adopt a variational Bayesian approach – replace inference problem with an optimisation problem: find the best approximating distribution – best in sense of minimising the relative entropy (KL divergence) between true and approximating distribution Question now is choice of approximating distributions

On the choice of approximations We look for a solution in the family of non- stationary Gaussian processes – equivalent to time varying linear dynamical system – add a mean field assumption, i.e. posterior factorises – parameterise the posterior continuous time linear dynamical system using low order polynomials between observation times This gives us an analytic expression for cost function (free energy) and gradients – no need for forwards / backwards integration

Examples – not another Lorenz!

Examples: on larger systems

On algorithm complexity Derivation of cost function equations challenging – but much can be automated Memory complexity < full weak constraint 4D VAR – need to store parameterised means and variances Time complexity > weak constraint 4D VAR – integral equations are complex, but can be solved analytically: no adjoint integration is needed Long time windows are possible – provides a bound on marginal likelihood, thus approximate parameter inference is possible

Relation to other methods This is a probabilistic method – targets the posterior, thus mean optimal – 4D VAR (except Eyink’s work) targets the mode Works on full time window; not sequential like EnKF and PF approaches – no sampling error, but still have approximation error Probably most suited for parameter estimation

Limitations Mean field => no correlations! – note correlations in model and reality still there, approximation only estimates marginally Variational => no guarantees on approximation error – empirically we see this is not a problem, comparing with very expensive HMC on low dimensional systems Parameterisation => much be chosen – Gaussian assumption on approximation, not true posterior – quality of approximation depends on closeness to Gaussian Implementation => model specific – need to re-derive equations if model changes, but automation possible

Conclusions Mean field methods make variational Bayesian approaches to data assimilation possible for large models – main benefit is parameter inference with long time windows – next steps test for inference of observation errors Acknowledgement: This work was funded by the EPSRC (EP/C005848/1) and EC FP7 under the GeoViQua project (ENV ; )