Introduction to emulators Tony O’Hagan University of Sheffield.

Slides:



Advertisements
Similar presentations
Uncertainties in Predictions of Arctic Climate Peter Challenor, Bablu Sinha (NOC) Myles Allen (Oxford), Robin Tokmakian (NPS)
Advertisements

Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.
Getting started with GEM-SA Marc Kennedy. This talk  Starting GEM-SA program  Creating input and output files  Explanation of the menus, toolbars,
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Uncertainty Quantification & the PSUADE Software
Designing Ensembles for Climate Prediction
Februar 2003 Workshop Kopenhagen1 Assessing the uncertainties in regional climate predictions of the 20 th and 21 th century Andreas Hense Meteorologisches.
Uncertainty Analysis Using GEM-SA. GEM-SA course - session 42 Outline Setting up the project Running a simple analysis Exercise More complex analyses.
Model assessment and cross-validation - overview
Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.
Gaussian Processes I have known
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.
Sensitivity Analysis for Complex Models Jeremy Oakley & Anthony O’Hagan University of Sheffield, UK.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
A Two Level Monte Carlo Approach To Calculating
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Today Introduction to MCMC Particle filters and MCMC
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
The Calibration Process
A) Transformation method (for continuous distributions) U(0,1) : uniform distribution f(x) : arbitrary distribution f(x) dx = U(0,1)(u) du When inverse.
1 Assessment of Imprecise Reliability Using Efficient Probabilistic Reanalysis Farizal Efstratios Nikolaidis SAE 2007 World Congress.
Quantify prediction uncertainty (Book, p ) Prediction standard deviations (Book, p. 180): A measure of prediction uncertainty Calculated by translating.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Gaussian process modelling
Calibration and Model Discrepancy Tony O’Hagan, MUCM, Sheffield.
1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.
Calibration of Computer Simulators using Emulators.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
29 May 2008IMA Scottish Branch1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan University of Sheffield.
Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.
Generic Approaches to Model Validation Presented at Growth Model User’s Group August 10, 2005 David K. Walters.
Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x?
Delivering Integrated, Sustainable, Water Resources Solutions Monte Carlo Simulation Robert C. Patev North Atlantic Division – Regional Technical Specialist.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Getting started with GEM-SA. GEM-SA course - session 32 This talk Starting GEM-SA program Creating input and output files Explanation of the menus, toolbars,
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Mathematical Models & Optimization?
Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.
D. M. J. Tax and R. P. W. Duin. Presented by Mihajlo Grbovic Support Vector Data Description.
5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Quantifying uncertainty in the.
Machine Design Under Uncertainty. Outline Uncertainty in mechanical components Why consider uncertainty Basics of uncertainty Uncertainty analysis for.
Emulation and Sensitivity Analysis of the CMAQ Model During a UK Ozone Pollution Episode Andrew Beddows Environmental Research Group King’s College London.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,
Machine Learning CUNY Graduate Center Lecture 6: Linear Regression II.
Anders Nielsen Technical University of Denmark, DTU-Aqua Mark Maunder Inter-American Tropical Tuna Commission An Introduction.
Computacion Inteligente Least-Square Methods for System Identification.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Marc Kennedy, Tony O’Hagan, Clive Anderson,
A Primer on Running Deterministic Experiments
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Chapter 11: Simple Linear Regression
The Calibration Process
Uncertainty Propagation
Presentation transcript:

Introduction to emulators Tony O’Hagan University of Sheffield

Emulation A computer model encodes a function, that takes inputs and produces outputs An emulator is a statistical approximation of that function Estimates what outputs would be obtained from given inputs With statistical measure of estimation error Given enough training data, estimation error variance can be made small

So what? A good emulator estimates the code accurately with small uncertainty and runs “instantly” So we can do UA/SA and all sorts of other things fast and efficiently Conceptually, we use model runs to learn about the function then derive any desired properties of the model

Gaussian process Simple regression models can be thought of as emulators But error estimates are invalid We use Gaussian process emulation Nonparametric, so can fit any function Error measures can be validated Analytically tractable, so can often do UA/SA etc analytically Highly efficient when many inputs

2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

3 code runs Adding another point changes estimate and reduces uncertainty

5 code runs And so on

Smoothness It is the basic assumption of a (homogeneously) smooth, continuous function that gives the GP its computational advantages The actual degree of smoothness concerns how rapidly the function “wiggles” A rough function responds strongly to quite small changes in inputs We need many more data points to emulate accurately a rough function over a given range

Effect of Smoothness Smoothness determines how fast the uncertainty increases between data points

Estimating smoothness We can estimate the smoothness from the data This is obviously a key Gaussian process parameter to estimate Need robust estimate Validate by predicting left-out data points

Higher dimensions With 2 inputs, we are fitting a surface through the data With many inputs, the principles are the same Smoothness parameters (one for each dimension, usually) are crucial Regression even more useful In many dimensions there is much more “space” between data points But we also get more smoothness

Automatic screening Models never respond strongly to all their inputs Pragmatically, we get a high level of smoothness except in a few dimensions By estimating smoothness, Gaussian process automatically adjusts Effectively projects points down through smooth dimensions 200 points in 25 dimensions look sparse But in 5 dimensions they are pretty good

Many outputs A model with many outputs effectively has many functions of the inputs E.g. a time series or spatial grid We can emulate multiple outputs in different ways Emulate each output separately Multivariate emulator Treat output index as another input

Design We need to choose input configurations at which to run the model to get training data Don’t need to be random Objective to learn about the function Well spaced points that cover the region of interest E.g. generate 100 random LHC samples and choose the one having largest minimum distance between points

Comparison with Monte Carlo Monte Carlo (and other sampling-based methods) need many thousands of model runs need new samples for each computation Bayesian methods using GP emulation usually need only a few hundred runs after which all computations use the same set of runs Difference is crucial for large, complex models

Bayesian UA/SA Plenty of experience now Analytic results for normal and uniform input distributions (working on others) Big efficiency saving over MC allows more systematic SA Main effect and interaction terms Nonlinearity assessment “What if” analyses with different input distributions GEM-SA software

Oakley and O’Hagan (2004) Example with 15 inputs, 250 runs MC/FAST needed over 15,360 runs to compute SA variance components with comparable accuracy

Bayesian calibration Theory in Kennedy and O’Hagan (2001) but less experience with applications Introduces a second GP to describe discrepancy between model and reality Model inadequacy function Smoothness is again important See Robin Hankin’s presentation Beta of GEM-Cal available

Challenges Dimensionality Multiple outputs Dynamic models Model inadequacy Data assimilation Smoothness/discontinuities

MUCM ‘Managing Uncertainty in Complex Models’ New project to take technology forward Will work with a variety of models Identify robust toolkit Provide UML specifications and case studies Climate models provisionally planned for a case study GCMs will probably stretch technology to limit To get convincing results may need a dedicated initiative As with CTCD or flooding proposal (Jim Hall, FREE)

Conclusions Bayesian GP emulation powerful and efficient Encompasses all kinds of techniques in one coherent framework Well established for UA/SA Software available Theory in place for calibration and other techniques Need more experience and software