Building an Emulator.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

SJS SDI_21 Design of Statistical Investigations Stephen Senn 2 Background Stats.

Introduction to Monte Carlo Markov chain (MCMC) methods

Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x.

EigenFaces and EigenPatches Useful model of variation in a region –Region must be fixed shape (eg rectangle) Developed for face recognition Generalised.

Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.

Design of Experiments Lecture I

Designing Ensembles for Climate Prediction

Pattern Recognition and Machine Learning

FTP Biostatistics II Model parameter estimations: Confronting models with measurements.

AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.

The General Linear Model Or, What the Hell’s Going on During Estimation?

Uncertainty Analysis Using GEM-SA. GEM-SA course - session 42 Outline Setting up the project Running a simple analysis Exercise More complex analyses.

Chapter 4: Linear Models for Classification

Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.

Gaussian Processes I have known

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Visual Recognition Tutorial

The Comparison of the Software Cost Estimating Methods

Lecture Notes for CMPUT 466/551 Nilanjan Ray

Simple Linear Regression

Lecture 5: Learning models using EM

Evaluating Hypotheses

1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.

Lecture II-2: Probability Review

1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.

Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Introduction to Monte Carlo Methods D.J.C. Mackay.

1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.

Gaussian process modelling

1 Least squares procedure Inference for least squares lines Simple Linear Regression.

Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.

Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

Emulation, Uncertainty, and Sensitivity

VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

Getting started with GEM-SA. GEM-SA course - session 32 This talk Starting GEM-SA program Creating input and output files Explanation of the menus, toolbars,

1 E. Fatemizadeh Statistical Pattern Recognition.

17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.

Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.

Mobile Robot Localization (ch. 7)

5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.

BioSS reading group Adam Butler, 21 June 2006 Allen & Stott (2003) Estimating signal amplitudes in optimal fingerprinting, part I: theory. Climate dynamics,

ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.

Chapter 3 Response Charts.

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.

Correlation & Regression Analysis

Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Review of statistical modeling and probability theory Alan Moses ML4bio.

More on regression Petter Mostad More on indicator variables If an independent variable is an indicator variable, cases where it is 1 will.

Introduction to emulators Tony O’Hagan University of Sheffield.

Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.

CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.

Chapter 7. Classification and Prediction

Linear Regression.

CJT 765: Structural Equation Modeling

Machine Learning Basics

Probabilistic Models with Latent Variables

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

Generally Discriminant Analysis

Kalman Filter: Bayes Interpretation

Presentation transcript:

Building an Emulator

EGU short course – session 2 Outline Recipe for building an emulator – MUCM toolkit Screening – which simulator inputs matter Design – where to run the simulator Model structure – mean and covariance functions Estimation / inference – building the emulator Validation – making sure the emulator is OK Possible extensions multiple outputs dynamic simulators Bayes linear methods Summary of emulation EGU short course – session 2

The ‘standard’ problem The MUCM toolkit recipe www.mucm.ac.uk

Step 0: Know your simulator Before attempting to create an emulator it is important you understand your simulator what are the plausible input ranges what constraints are there in input combinations what is the output behaviour like Ideally you may wish to elicit beliefs about the distributions of the inputs if these are not known at least ranges are needed for all inputs

Step 1: Screening – active inputs All serious simulators require more than one input the norm is anything from a few to thousands all of the basic emulation theory in the toolkit assumes multiple inputs Large numbers of inputs pose computational problems dimension reduction techniques have been developed output typically depends principally on a few inputs Screening seeks to identify the most important inputs for a given output most often the Morris method is used, which is a cheap sensitivity analysis approximation EGU short course – session 2

Screening: the Morris method Basic idea: develop a design that changes one input at a time, while filling space Morris designs based on a series of repeated trajectories which change only one input each step Compute: average of the elementary effects (μ) variance of the elementary effects (σ) non-linear effect non-linear effect no effect linear effect EGU short course – session 2

Step 2: Design Design is all about choosing where to run the simulator to learn a good emulator we also need to consider validation and calibration There are many options for design and many issues in the absence of additional information space-filling designs are used grids are infeasible for all but trivial simulators

Training sample design To build an emulator, we use a set of simulator runs our training data are y1 = f(x1), ..., yn = f(xn) x1, x2, ..., xn are n different points in the space of inputs this set of n points is a design A good design will provide us with maximum information about the simulator and hence an emulator that is as good as possible EGU short course – session 2

Latin hypercube designs LHC designs use n values for each input combining randomly Advantages doesn’t necessarily require a large number of points nothing lost if some inputs are inactive Disadvantages random choice may not produce an even spread of points need to generate many LHC designs and pick the best EGU short course – session 2

EGU short course – session 2

Some more design choices Various formulae and algorithms exist to generate space-filling designs for any number of inputs the Sobol sequence is often used quick and convenient not always good when some inputs are inactive Optimal designs maximise/minimise some criterion e.g. maximum entropy designs can be hard to compute, often not massive gains Hybrid designs try to satisfy two criteria space-filling but also having a few points closer together in order to estimate correlation lengths well EGU short course – session 2

Step 3: Building the emulator In deciding on the structure of the emulator we have many choices to make: the mean function the covariance function the prior specifications There are no universal solutions here, so judgement and validation play an important role EGU short course – session 2

The technical part (overview!) The emulator is a Gaussian process with: mean function m(x) = h(x)Tβ, with h(x) typically [1,x] covariance function σ2c(x,x’) = σ2exp-{(x-x’)TC(x-x’)} C is a diagonal matrix of inverse length scales 1/ δ2 thus the conditional distribution of the simulator output, y, given the input, x and parameters (β, σ2, δ) is multivariate normal The choices we make can be important The mean (black) and variance (grey) from a posterior GP (with Matern covariance) based on the red training data, and four realisations from this posterior. EGU short course – session 2

EGU short course – session 2 The GP mean function We can use this to say what kind of shape we would expect the output to take as a function of the inputs Most simulator outputs exhibit some overall trend in response to varying a single input so we usually specify a linear mean function slopes (positive or negative) are estimated from the training data the emulator mean smoothes the residuals after fitting the linear terms We can generalise to other kinds of mean function if we have a clear idea of how the simulator will behave the better the mean function the less the GP has to do EGU short course – session 2

EGU short course – session 2 Example Simulator is solid line Dashed line is linear fit Blue arrows indicate fitted residuals Without the linear mean function, we’d have a horizontal (constant) fit and larger residuals leading to larger emulator uncertainty EGU short course – session 2

EGU short course – session 2

The GP covariance function The covariance function determines how ‘wiggly’ the response is to each input There’s a lot of flexibility here, but standard covariance functions have a parameter for each input these ‘correlation length’ parameters are also estimated from the training data but some care is needed For predicting output at untried x, correlation lengths are important they determine how much information comes from nearby training points and hence the emulator accuracy

Prior distributions Prior information enters through the form of the mean function and to a lesser extent the covariance function But we can also supply prior information through the prior distributions for slope/regression parameters and correlation lengths also the overall variance parameter Putting in genuine prior information here generally improves emulator performance compared with standard ‘non-informative’ priors e.g.

Step 4: Learning the emulator We normally proceed using Bayesian inference just how Bayesian depends on size of problem ideally we would ‘integrate out’ all unknown parameters, but this can be difficult, requiring MCMC Details are on the toolkit, but in summary typically one can integrate out the regression coefficients (β) and variance parameters(σ2) optimise (maximum likelihood, or MAP) the covariance length scales (δ) Ignoring uncertainty in length scales can be a problem if they are not well identified, but typically the mean function does most of the work EGU short course – session 2

EGU short course – session 2

Prediction with the emulator Once the (hyper)-parameters of the emulator have been learnt (or integrated out) one can use the emulator to predict at a new input what the simulator output would have been this is always a predictive distribution EGU short course – session 2

EGU short course – session 2

Step 5: Validating the emulator Validating the emulator is essential full probabilistic assessment of fitness for purpose first examine the standardised residuals, with +/- 2 std intervals visual assessment is often very helpful and provides diagnostic information EGU short course – session 2

EGU short course – session 2 What is validation? What does it mean to validate an emulator? compare the emulator’s predictions with the simulator output make a validation sample of runs at new input configurations the emulator mean is the best prediction and is always wrong but the emulator predicts uncertainty around that mean The emulator is valid if its expressions of uncertainty are correct actual outputs should fall in 95% intervals 95% of the time no less and no more than 95% of the time standardised residuals should have zero mean and unit variance See Bastos and O’Hagan preprint on MUCM website EGU short course – session 2

Measures for validation The Mahalanobis distance on a test set accounts for the predictive covariance on the test set follows an F-distribution so we can check the value is close to the theoretical one for a given test set size A useful diagnostic is the pivoted Cholesky decomposition of the predictive covariance Should follow a t distribution Suggests non-stationary / poor predictive variance Suggests poor length scale / covariance function

What types of simulator are amenable to emulation? Extensions What types of simulator are amenable to emulation?

EGU short course – session 2 Many outputs Most simulators also produce multiple outputs for instance, a climate simulator may predict temperature on a grid, etc. Usually, for any given use of the simulator we are interested in just one output so we can just emulate that one, particularly if it is some combination of the others, e.g. mean global surface temperature But some problems require multi-output emulation again, there are dimension reduction techniques All described in the MUCM toolkit EGU short course – session 2

Multi-output emulators When we need to emulate several simulator outputs, there are a number of available approaches single output GP with added input(s) indexing the outputs for temperature outputs on a grid, make grid coordinates 2 additional inputs independent GPs multivariate GP independent GPs for a linear transformation e.g. principal components possibility for dimension reduction These are all documented in the MUCM toolkit EGU short course – session 2

EGU short course – session 2

EGU short course – session 2 Dynamic emulation Many simulators predict a process evolving in time at each time-step the simulator updates the system state often driven by external forcing variables at each time-step climate models are usually dynamic in this sense We are interested in emulating the simulator’s time series of outputs the various forms of multi-output emulation can be used or a dynamic emulator, emulating the single time-step and then iterating the emulator Also documented in the MUCM toolkit EGU short course – session 2

EGU short course – session 2

EGU short course – session 2 Stochastic emulation Other simulators produce non-deterministic outputs running a stochastic simulator twice with the same input x produces randomly different outputs Different emulation strategies arise depending on what aspect of the output is of interest interest focuses on the mean output has added noise which we allow for when building the emulator interest focuses on risk of exceeding a threshold emulate the distribution and derive the risk emulate the risk This is not yet covered in the MUCM toolkit EGU short course – session 2

EGU short course – session 2 Bayes linear methods So far assumed a fully Bayesian framework But there is an alternative framework – Bayes linear methods based only on first and second order moments means, variances, covariances avoids making assumptions about distributions its predictions are also first and second order moments means, variances, covariances but no distributions The toolkit contains theory and procedures for Bayes linear emulators EGU short course – session 2

EGU short course – session 2

Bayes linear emulators Much of the mathematics is very similar a Bayes linear emulator is not a GP but gives the same mean and variance predictions for given correlation lengths, mean function parameters although these are handled differently but the emulator predictions no longer have distributions Compared with GP emulators advantages – simpler and may be feasible for more complex problems disadvantages – absence of distributions limits many of the uses of emulators compromises made EGU short course – session 2

Summary and Limitations Why emulation is not always a silver bullet

Some caveats on emulation Not all simulators are suitable for emulation with very large numbers of (>50) outputs need specific emulators and large training sets for the problem you are solving are all outputs needed? for dynamic simulators with high dimensional state spaces there remain computational issues with discrete inputs and outputs Gaussian processes are not well suited But these issues are being addressed actively in research projects across the world including MUCM EGU short course – session 2

Typical sequence of emulation Define the problem you want to solve, identify the simulator Identify the inputs, define ranges and screen to select Design the training set and run the simulator Validate the emulator and if necessary refine Train the emulator using the training set and inference method Choose the emulator (mean and covariance) and define priors Use the emulator and if necessary refine Modify the simulator or refine it, maybe using observations EGU short course – session 2

EGU short course – session 2 Summary Before you emulate know your simulator! Think carefully about the problem you really want to solve emulation is a tool to solve interesting problems and not an aim in itself The more prior knowledge you bring the easier the task will be choosing mean and covariance, eliciting priors Spend time on validation and refinement Building an emulator will help you understand your simulators … not replace them! EGU short course – session 2