# SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models.

## Presentation on theme: "SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models."— Presentation transcript:

SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models

http://mucm.group.shef.ac.ukSlide 2 Outline Uncertainty in models Quantifying uncertainty Example: dynamic global vegetation model Reducing uncertainty Example: atmospheric deposition model Methods Research directions Dynamic models MUCM

http://mucm.group.shef.ac.ukSlide 3 Computer models In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real- world processes For understanding, prediction, control There is a growing realisation of the importance of uncertainty in model predictions Can we trust them? Without any quantification of output uncertainty, it’s easy to dismiss them

http://mucm.group.shef.ac.ukSlide 4 Examples Climate prediction Molecular dynamics Nuclear waste disposal Oil fields Engineering design Hydrology

http://mucm.group.shef.ac.ukSlide 5 Sources of uncertainty A computer model takes inputs x and produces outputs y = f(x) How might y differ from the true real-world value z that the model is supposed to predict? Error in inputs x Initial values, forcing inputs, model parameters Error in model structure or solution Wrong, inaccurate or incomplete science Bugs, solution errors

http://mucm.group.shef.ac.ukSlide 6 Quantifying uncertainty The ideal is to provide a probability distribution p(z) for the true real-world value The centre of the distribution is a best estimate Its spread shows how much uncertainty about z is induced by uncertainties on the last slide How do we get this? Input uncertainty: characterise p(x), propagate through to p(y) Structural uncertainty: characterise p(z-y)

http://mucm.group.shef.ac.ukSlide 7 Example: UK carbon flux in 2000 Vegetation model predicts carbon exchange from each of 700 pixels over England & Wales Principal output is Net Biosphere Production Accounting for uncertainty in inputs Soil properties Properties of different types of vegetation Aggregated to England & Wales total Allowing for correlations Estimate 7.61 Mt C Std deviation 0.61 Mt C

http://mucm.group.shef.ac.ukSlide 8 Maps

http://mucm.group.shef.ac.ukSlide 9 Sensitivity analysis Map shows proportion of overall uncertainty in each pixel that is due to uncertainty in the vegetation parameters As opposed to soil parameters Contribution of vegetation uncertainty is largest in grasslands/moorlands

http://mucm.group.shef.ac.ukSlide 10 England & Wales aggregate PFT Plug-in estimate (Mt C) Mean (Mt C) Variance (Mt C 2 ) Grass5.284.650.323 Crop0.850.500.038 Deciduous2.131.690.009 Evergreen0.800.780.001 Covariances0.001 Total9.067.610.372

http://mucm.group.shef.ac.ukSlide 11 Reducing uncertainty To reduce uncertainty, get more information! Informal – more/better science Tighten p(x) through improved understanding Tighten p(z-y) through improved modelling or programming Formal – using real-world data Calibration – learn about model parameters Data assimilation – learn about the state variables Learn about structural error z-y Validation

http://mucm.group.shef.ac.ukSlide 12 Example: Nuclear accident Radiation was released after an accident at the Tomsk-7 chemical plant in 1993 Data comprise measurements of the deposition of ruthenium 106 at 695 locations obtained by aerial survey after the release The computer code is a simple Gaussian plume model for atmospheric dispersion Two calibration parameters Total release of 106 Ru (source term) Deposition velocity

http://mucm.group.shef.ac.ukSlide 13 Data

http://mucm.group.shef.ac.ukSlide 14 A small sample (N=10 to 25) of the 695 data points was used to calibrate the model Then the remaining observations were predicted and RMS prediction error computed On a log scale, error of 0.7 corresponds to a factor of 2 Calibration Sample size N10152025 Best fit calibration0.820.790.760.66 Bayesian calibration0.490.410.370.38

http://mucm.group.shef.ac.ukSlide 15 So far, so good, but In principle, all this is straightforward In practice, there are many technical difficulties Formulating uncertainty on inputs Elicitation of expert judgements Propagating input uncertainty Modelling structural error Anything involving observational data! The last two are intricately linked And computation

http://mucm.group.shef.ac.ukSlide 16 The problem of big models Tasks like uncertainty propagation and calibration require us to run the model many times Uncertainty propagation Implicitly, we need to run f(x) at all possible x Monte Carlo works by taking a sample of x from p(x) Typically needs thousands of model runs Calibration Traditionally this is done by searching the x space for good fits to the data This is impractical if the model takes more than a few seconds to run We need a more efficient technique

http://mucm.group.shef.ac.ukSlide 17 Gaussian process representation More efficient approach First work in early 1980s (DACE) Consider the code as an unknown function f(.) becomes a random process We represent it as a Gaussian process (GP) Training runs Run model for sample of x values Condition GP on observed data Typically requires many fewer runs than MC And x values don’t need to be chosen randomly

http://mucm.group.shef.ac.ukSlide 18 Emulation Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters The posterior distribution is known as an emulator of the computer code Posterior mean estimates what the code would produce for any untried x (prediction) With uncertainty about that prediction given by posterior variance Correctly reproduces training data

http://mucm.group.shef.ac.ukSlide 19 2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

http://mucm.group.shef.ac.ukSlide 20 3 code runs Adding another point changes estimate and reduces uncertainty

http://mucm.group.shef.ac.ukSlide 21 5 code runs And so on

http://mucm.group.shef.ac.ukSlide 22 Then what? Given enough training data points we can emulate any model accurately So that posterior variance is small “everywhere” Typically, this can be done with orders of magnitude fewer model runs than traditional methods Use the emulator to make inference about other things of interest E.g. uncertainty analysis, calibration Conceptually very straightforward in the Bayesian framework But of course can be computationally hard

http://mucm.group.shef.ac.ukSlide 23 DACE and BACCO This has led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models All based on building the GP emulator of the model from a set of training runs Early work concentrated on prediction – DACE Design and Analysis of Computer Experiments More recently, wider scope and emphasis on the Bayesian framework – BACCO Bayesian Analysis of Computer Code Output

http://mucm.group.shef.ac.ukSlide 24 BACCO includes Uncertainty analysis Sensitivity analysis Calibration Data assimilation Model validation Optimisation Etc… All within a single coherent framework

http://mucm.group.shef.ac.ukSlide 25 Research directions Models with heterogeneous local behaviour Regions of input space with rapid response, jumps High dimensional models Many inputs, outputs, data points Dynamic models Data assimilation Stochastic models Relationship between models and reality Model/emulator validation Multiple models Design of experiments Sequential design

http://mucm.group.shef.ac.ukSlide 26 Dynamic models Initial state x 0 is updated recursively by the one-step model f 1 (x,c) Forcing inputs c t Interested in sequence x 1, x 2, … x T At least 4 approaches to emulating this XTXT X0X0 c1c1 f1f1 X1X1 c2c2 f1f1 X2X2 c3c3 f1f1 X T-2 c T-1 f1f1 X T-1 cTcT f1f1

http://mucm.group.shef.ac.ukSlide 27 1. Treat time as input Emulate x t as the function f(x 0,t) = f t (x 0,c t ) = f 1 (x t-1,c t ) = f 1 (… f 1 (f 1 (x 0,c 1 ),c 2 )…,c t ) Easy to do Hard to get the temporal correlation structure right XTXT X0X0 c1c1 f1f1 X1X1 c2c2 f1f1 X2X2 c3c3 f1f1 X T-2 c T-1 f1f1 X T-1 cTcT f1f1

http://mucm.group.shef.ac.ukSlide 28 2. Multivariate emulation Emulate the vector x = (x 1, x 2, … x T ) as the multi-output function f T (x 0 ) = (f 1 (x 0,c 1 ), f 2 (x 0,c 2 ), …, f T (x 0,c T )) Simple extension of univariate theory Restrictive covariance structure XTXT X0X0 c1c1 f1f1 X1X1 c2c2 f1f1 X2X2 c3c3 f1f1 X T-2 c T-1 f1f1 X T-1 cTcT f1f1

http://mucm.group.shef.ac.ukSlide 29 3. Functional output emulation Treat x series as a functional output Identify features (e.g. principal components) to summarise function Same theory as for multivariate emulation, but lower dimension Loss of information, no longer reproduces training data XTXT X0X0 c1c1 f1f1 X1X1 c2c2 f1f1 X2X2 c3c3 f1f1 X T-2 c T-1 f1f1 X T-1 cTcT f1f1

http://mucm.group.shef.ac.ukSlide 30 4. Recursive emulation Emulate the single step function f 1 (x,c) Iterate the emulator numerically May take longer than original model! Or approximate filtering algorithm May be inaccurate for large T XTXT X0X0 c1c1 f1f1 X1X1 c2c2 f1f1 X2X2 c3c3 f1f1 X T-2 c T-1 f1f1 X T-1 cTcT f1f1

http://mucm.group.shef.ac.ukSlide 31 Some comparisons Multivariate emulation (approach 2) is better than treating time as an input (approach 1) Validates better Able to work with partial series Recursive emulation still under development Looking promising Only recursive emulation can realistically treat uncertainty in forcing inputs Only recursive emulation can extend T beyond training data

http://mucm.group.shef.ac.ukSlide 32 MUCM Managing Uncertainty in Complex Models Large 4-year research grant 7 postdoctoral research assistants 4 PhD studentships Started in June 2006 Based in Sheffield and 4 other UK universities Objective: To develop BACCO methods into a robust technology … toolkit that is widely applicable across the spectrum of modelling applications case studies

http://mucm.group.shef.ac.ukSlide 33 Thank you

Download ppt "SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models."

Similar presentations