Emulators and MUCM. Outline Background Simulators Uncertainty in model inputs Uncertainty analysis Case study – dynamic vegetation simulator Emulators.

Slides:



Advertisements
Similar presentations
Rachel T. Johnson Douglas C. Montgomery Bradley Jones
Advertisements

Applications of one-class classification
Slide 1 of 18 Uncertainty Representation and Reasoning with MEBN/PR-OWL Kathryn Blackmond Laskey Paulo C. G. da Costa The Volgenau School of Information.
Variation, uncertainties and models Marian Scott School of Mathematics and Statistics, University of Glasgow June 2012.
Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.
1 -Classification: Internal Uncertainty in petroleum reservoirs.
Assumptions underlying regression analysis
How to Emulate: Recipes without Patronising
Stage Gate - Lecture 11 Stage Gate – Lecture 1 Technology Development © 2009 ~ Mark Polczynski.
Uncertainty and Sensitivity Analysis of Complex Computer Codes
Southampton workshop, July 2009Slide 1 Tony O’Hagan, University of Sheffield Simulators and Emulators.
Quantifying and managing uncertainty with Gaussian process emulators Tony O’Hagan University of Sheffield.
Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x.
14 May 2008RSS Oxford1 Towards quantifying the uncertainty in carbon fluxes Tony O’Hagan University of Sheffield.
SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models.
Durham workshop, July 2008Slide 1 Tony O’Hagan, University of Sheffield MUCM: An Overview.
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Simulators and Emulators Tony O’Hagan University of Sheffield.
Emulation, Elicitation and Calibration UQ12 Minitutorial Presented by: Tony O’Hagan, Peter Challenor, Ian Vernon UQ12 minitutorial - session 11.
Slide 1 John Paul Gosling University of Sheffield GEM-SA: a tutorial.
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Probabilistic Reasoning over Time
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Designing Ensembles for Climate Prediction
Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.
Gaussian Processes I have known
MARLAP Measurement Uncertainty
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.
Evaluating Hypotheses
The Calibration Process
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
SIMULATION. Simulation Definition of Simulation Simulation Methodology Proposing a New Experiment Considerations When Using Computer Models Types of Simulations.
Uncertainty in Engineering - Introduction Jake Blanchard Fall 2010 Uncertainty Analysis for Engineers1.
Lecture II-2: Probability Review
Monté Carlo Simulation MGS 3100 – Chapter 9. Simulation Defined A computer-based model used to run experiments on a real system.  Typically done on a.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
1 D r a f t Life Cycle Assessment A product-oriented method for sustainability analysis UNEP LCA Training Kit Module k – Uncertainty in LCA.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Session 3: Calibration Using observations of the real process.
Gaussian process modelling
Calibration and Model Discrepancy Tony O’Hagan, MUCM, Sheffield.
Lecture 1 What is Modeling? What is Modeling? Creating a simplified version of reality Working with this version to understand or control some.
Calibration of Computer Simulators using Emulators.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Monte Carlo Simulation CWR 6536 Stochastic Subsurface Hydrology.
29 May 2008IMA Scottish Branch1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan University of Sheffield.
Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x?
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Uncertainty in environmental.
Why it is good to be uncertain ? Martin Wattenbach, Pia Gottschalk, Markus Reichstein, Dario Papale, Jagadeesh Yeluripati, Astley Hastings, Marcel van.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. 1.
Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1.
Robust Estimators.
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Quantifying uncertainty in the.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,
Review of statistical modeling and probability theory Alan Moses ML4bio.
Computacion Inteligente Least-Square Methods for System Identification.
Introduction to emulators Tony O’Hagan University of Sheffield.
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.
Marc Kennedy, Tony O’Hagan, Clive Anderson,
Chapter 7. Classification and Prediction
The Calibration Process
Introduction to Instrumentation Engineering
Presentation transcript:

Emulators and MUCM

Outline Background Simulators Uncertainty in model inputs Uncertainty analysis Case study – dynamic vegetation simulator Emulators Computation MUCM EGU short course - session 12

Background

Simulators In almost all fields of science, technology, industry and policy making, people use mechanistic models For understanding, prediction, control A model simulates a real-world, usually complex, phenomenon as a set of mathematical equations Models are usually implemented as computer programs We will refer to a computer implementation of a model as a simulator EGU short course - session 14

Examples Climate prediction Molecular dynamics Nuclear waste disposal Oil fields Engineering design Hydrology EGU short course - session 15

The simulator as a function Using computer language, a simulator takes a number of inputs and produces a number of outputs We can represent any output y as a function y = f(x) of a vector x of inputs EGU short course - session 16

Why worry about uncertainty? How accurate are model predictions? There is increasing concern about uncertainty in model outputs Particularly where simulator predictions are used to inform scientific debate or environmental policy Are their predictions robust enough for high stakes decision- making? EGU short course - session 17

For instance … Models for climate change produce different predictions for the extent of global warming or other consequences Which ones should we believe? What error bounds should we put around these? Are simulator differences consistent with the error bounds? Until we can answer such questions convincingly, why should anyone have faith in the science? EGU short course - session 18

Where is the uncertainty? How might the simulator output y = f(x) differ from the true real-world value z that the simulator is supposed to predict? Error in inputs x Initial values Forcing inputs Model parameters Error in model structure or solution Wrong, inaccurate or incomplete science Bugs, solution errors EGU short course - session 19

Quantifying uncertainty EGU short course - session 1 10 The ideal is to provide a probability distribution p(z) for the true real-world value The centre of the distribution is a best estimate Its spread shows how much uncertainty about z is induced by uncertainties on the previous slide How do we get this? Input uncertainty: characterise p(x), propagate through to p(y) Structural uncertainty: characterise p(z-y)

Uncertainty in model inputs

Inputs We will interpret “inputs” widely Initial conditions Other data defining the particular context being simulated Exogenous (“forcing”) data Parameters in model equations Often hard-wired (which is a problem!) EGU short course - session 112

The CENTURY model For example, consider CENTURY, a model of soil carbon processes Initial conditions: 8 carbon pools Other contextual data: 3 soil texture inputs Sand, clay, silt Exogenous data: 3 climate inputs for each monthly time step Parameters: coded in differential equations EGU short course - session 113

Input uncertainty We are typically uncertain about the values of many of the inputs Measurement error, lack of knowledge E.g. CENTURY Texture, initial carbon pools, (future) climate Input uncertainty should be expressed as a probability distribution Across all uncertain inputs Model users are often reluctant to specify more than plausible bounds Inadequate to characterise output uncertainty EGU short course - session 114

Output uncertainty Input uncertainty induces uncertainty in the output y It also has a probability distribution In theory, this is completely determined by the probability distribution on x and the model f In practice, finding this distribution and its properties is not straightforward EGU short course - session 115

A trivial model Suppose we have just two inputs and a simple linear model y = x 1 + 3x 2 Suppose that x 1 and x 2 have independent uniform distributions over [0, 1] i.e. they define a point that is equally likely to be anywhere in the unit square Then we can determine the distribution of y exactly EGU short course - session 116

Trivial model – output distribution The distribution of y has this trapezium form EGU short course - session 117

Trivial model – normal inputs If x 1 and x 2 have normal distributions N(0.5, ) we get a normal output EGU short course - session 118

A slightly less trivial model Now consider the simple nonlinear model y = sin(x 1 )/{1+exp(x 1 +x 2 )} We still have only 2 inputs and quite a simple equation But even for nice input distributions we cannot get the output distribution exactly The simplest way to compute it would be by Monte Carlo EGU short course - session 119

Monte Carlo output distribution This is for the normal inputs 10,000 random normal pairs were generated and y calculated for each pair EGU short course - session 120

Uncertainty analysis (UA) The process of characterising the distribution of the output y is called uncertainty analysis Plotting the distribution is a good graphical way to characterise it Quantitative summaries are often more important Mean, median Standard deviation, quartiles Probability intervals EGU short course - session 121

UA of slightly nonlinear model Mean = 0.117, median = Std. dev. = % range (quartiles) = [0.092, 0.148] 95% range = [0.004, 0.200] EGU short course - session 122

UA versus plug-in Even if we just want to estimate y, UA does better than the “plug-in” approach of running the model for estimated values of x For the simple nonlinear model, the central estimates of x 1 and x 2 are 0.5, but sin(0.5)/(1+exp(1)) = is a slightly too high estimate of y compared with the mean of or median of The difference can be much more marked for highly nonlinear models As is often the case with serious simulators EGU short course - session 123

Summary Why UA? Proper quantification of output uncertainty Need proper probabilistic expression of input uncertainty Improved central estimate of output Better than the usual plug-in approach EGU short course - session 124

A real case study

Example: UK carbon flux in 2000 Vegetation model predicts carbon exchange from each of 700 pixels over England & Wales in 2000 Principal output is Net Biosphere Production Accounting for uncertainty in inputs Soil properties Properties of different types of vegetation Land usage (Not structural uncertainty) Aggregated to England & Wales total Allowing for correlations Estimate 7.46 Mt C Std deviation 0.54 Mt C EGU short course - session 126

Maps EGU short course - session 127 Mean NBPStandard deviation

England & Wales aggregate EGU short course - session 128 PFT Plug-in estimate (Mt C) Mean (Mt C) Variance (Mt C 2 ) Grass Crop Deciduous Evergreen Covariances Total

Reducing uncertainty To reduce uncertainty, get more information! Informal – more/better science Tighten p(x) through improved understanding Tighten p(z-y) through improved modelling or programming Formal – using real-world data Calibration – learn about model parameters Data assimilation – learn about the state variables Learn about structural error z-y Validation EGU short course - session 129

Emulators

So far, so good, but In principle, all this is straightforward In practice, there are many technical difficulties Formulating uncertainty on inputs Elicitation of expert judgements Propagating input uncertainty Modelling structural error Anything involving observational data! The last two are intricately linked And computation EGU short course - session 131

The problem of big models Tasks like uncertainty propagation and calibration require us to run the simulator many times Uncertainty propagation Implicitly, we need to run f(x) at all possible x Monte Carlo works by taking a sample of x from p(x) Typically needs thousands of simulator runs Calibration Traditionally done by searching x space for good fits to the data Both become impractical if the simulator takes more than a few seconds to run 10,000 runs at 1 minute each takes a week of computer time We need a more efficient technique EGU short course - session 132

Gaussian process representation More efficient approach First work in early 1980s (DACE) Represent the code as an unknown function f(.) becomes a random process We generally represent it as a Gaussian process (GP) Or its second-order moment version Training runs Run simulator for sample of x values Condition GP on observed data Typically requires many fewer runs than Monte Carlo And x values don’t need to be chosen randomly EGU short course - session 133

Emulation Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters The posterior distribution is known as an emulator of the computer simulator Posterior mean estimates what the simulator would produce for any untried x (prediction) With uncertainty about that prediction given by posterior variance Correctly reproduces training data EGU short course - session 134

2 code runs EGU short course - session 135 Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

3 code runs EGU short course - session 136 Adding another point changes estimate and reduces uncertainty

5 code runs EGU short course - session 137 And so on

Then what? Given enough training data points we can in principle emulate any simulator output accurately So that posterior variance is small “everywhere” Typically, this can be done with orders of magnitude fewer model runs than traditional methods At least in relatively low-dimensional problems Use the emulator to make inference about other things of interest E.g. uncertainty analysis, calibration Conceptually very straightforward in the Bayesian framework But of course can be computationally hard EGU short course - session 138

BACCO This has led to a wide ranging body of tools for all kinds of tasks involving uncertainties in computer models Uncertainty analysis, sensitivity analysis, calibration, data assimilation, validation, optimisation etc. All in a single coherent framework All based on building an emulator of the simulator from a set of training runs This area is sometimes known as BACCO Bayesian Analysis of Computer Code Output EGU short course - session 139

MUCM Managing Uncertainty in Complex Models Large 4-year research grant June 2006 to September postdoctoral research associates 4 project PhD students Objective to develop BACCO methods into a basic technology, usable and widely applicable MUCM2: New directions for MUCM Smaller 2-year grant to September 2012 Scoping and developing research proposals EGU short course - session 140

Primary MUCM deliverables Methodology and papers moving the technology forward Papers both in statistics and application area journals The MUCM toolkit Documentation of the methods and how to use them With emphasis on what is found to work reliably across a range of modelling areas Web-based Case studies Three substantial case studies Showcasing methods and best practice Linked to toolkit Events Workshops – conceptual and hands-on Short courses Conferences – UCM 2010 EGU short course - session 141

Focus on the toolkit The toolkit is a ‘recipe book’ The good sort that encourages you to experiment There are recipes (procedures) but also lots of explanation of concepts and discussion of choices It is not a software package Software packages are great if they are in your favourite language But it probably wouldn’t be! Packages are dangerous without basic understanding Th e purpose of the toolkit is to build that understanding And it enables you to easily develop your own code EGU short course - session 142