Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.

Slides:

Advertisements

Similar presentations

Probability and Maximum Likelihood. How are we doing on the pass sequence? This fit is pretty good, but… Hand-labeled horizontal coordinate, t The red.

Advertisements

Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.

1 -Classification: Internal Uncertainty in petroleum reservoirs.

Uncertainty and Sensitivity Analysis of Complex Computer Codes

Southampton workshop, July 2009Slide 1 Tony O’Hagan, University of Sheffield Simulators and Emulators.

Quantifying and managing uncertainty with Gaussian process emulators Tony O’Hagan University of Sheffield.

SAMSI Kickoff 11/9/06Slide 1 Simulators, emulators, predictors – Validity, quality, adequacy Tony O’Hagan.

Durham workshop, July 2008Slide 1 Tony O’Hagan, University of Sheffield MUCM: An Overview.

Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.

Interfacing physical experiments and computer models Preliminary remarks Tony O’Hagan.

Data assimilation Preliminary remarks Tony O’Hagan.

Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.

Objectives 10.1 Simple linear regression

Designing Ensembles for Climate Prediction

Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.

G. Alonso, D. Kossmann Systems Group

Uncertainty Analysis Using GEM-SA. GEM-SA course - session 42 Outline Setting up the project Running a simple analysis Exercise More complex analyses.

Model Assessment and Selection

Automated Anomaly Detection, Data Validation and Correction for Environmental Sensors using Statistical Machine Learning Techniques

Gaussian Processes I have known

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.

Sensitivity Analysis for Complex Models Jeremy Oakley & Anthony O’Hagan University of Sheffield, UK.

PROVIDING DISTRIBUTED FORECASTS OF PRECIPITATION USING A STATISTICAL NOWCAST SCHEME Neil I. Fox and Chris K. Wikle University of Missouri- Columbia.

Evaluating Hypotheses

Value of Information Some introductory remarks by Tony O’Hagan.

A Two Level Monte Carlo Approach To Calculating

1 Transforming the efficiency of Partial EVSI computation Alan Brennan Health Economics and Decision Science (HEDS) Samer Kharroubi Centre for Bayesian.

Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.

8-2 Basics of Hypothesis Testing

The Calibration Process

Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.

Measurement, Quantification and Analysis Some Basic Principles.

Session 3: Calibration Using observations of the real process.

Engineering subprogramme, 7 November 2006 Tony O’Hagan.

Day 7 Model Evaluation. Elements of Model evaluation l Goodness of fit l Prediction Error l Bias l Outliers and patterns in residuals.

Gaussian process modelling

Calibration and Model Discrepancy Tony O’Hagan, MUCM, Sheffield.

Calibration of Computer Simulators using Emulators.

 1  Outline  stages and topics in simulation  generation of random variates.

Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

A Sampling Distribution

Advances in Robust Engineering Design Henry Wynn and Ron Bates Department of Statistics Workshop at Matforsk, Ås, Norway 13 th -14 th May 2004 Design of.

Emulation, Uncertainty, and Sensitivity

VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,

Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x?

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Statistics and the Verification Validation & Testing of Adaptive Systems Roman D. Fresnedo M&CT, Phantom Works The Boeing Company.

17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.

Confidence intervals and hypothesis testing Petter Mostad

Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.

1 OUTPUT ANALYSIS FOR SIMULATIONS. 2 Introduction Analysis of One System Terminating vs. Steady-State Simulations Analysis of Terminating Simulations.

Emulation and Sensitivity Analysis of the CMAQ Model During a UK Ozone Pollution Episode Andrew Beddows Environmental Research Group King’s College London.

Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.

- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.

CSC321: Lecture 7:Ways to prevent overfitting

Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 11: Models Marshall University Genomics Core Facility.

How Good is a Model? How much information does AIC give us? –Model 1: 3124 –Model 2: 2932 –Model 3: 2968 –Model 4: 3204 –Model 5: 5436.

- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.

Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,

LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.

CS 2750: Machine Learning The Bias-Variance Tradeoff Prof. Adriana Kovashka University of Pittsburgh January 13, 2016.

Introduction to emulators Tony O’Hagan University of Sheffield.

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.

Marc Kennedy, Tony O’Hagan, Clive Anderson,

PSY 626: Bayesian Statistics for Psychological Science

ACCURACY IN PERCENTILES

PSY 626: Bayesian Statistics for Psychological Science

10701 / Machine Learning Today: - Cross validation,

Presentation transcript:

Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield

Why am I here?  I probably know less about finite elements modelling than anyone else at this meeting  But I have been working with mechanistic models of all kinds for almost 20 years  Models of climate, oil reservoirs, rainfall runoff, aero-engines, sewer systems, vegetation growth, disease progression,...  What I do know about is uncertainty  I’m a statistician  My field is Bayesian statistics  One of my principal research areas is to understand, quantify and reduce uncertainty in the predictions made by models  I bring a different perspective on model validation 6/9/20092mucm.group.shef.ac.uk

Some background  Models are often highly computer intensive  Long run times  FE models on fine grid  Oil reservoir simulator runs can take days  Things we want to do with them may require many runs  Uncertainty analysis  Exploring output uncertainty induced by uncertainty in model inputs  Calibration  Searching for parameter values to match observational data  Optimisation  Searching for input settings to optimise output  We need efficient methods requiring minimal run sets 6/9/20093mucm.group.shef.ac.uk

Emulation  We use Bayesian statistics  Based on a training sample of model runs, we estimate what the model output would be at all untried input configurations  The result is a statistical representation of the model  In the form of a stochastic process over input space  The process mean is our best estimate of what the output would be at any input configuration  Uncertainty is captured by variances and covariances  It correctly returns what we know  At any training sample point, the mean is the observed value  With zero variance 6/9/20094mucm.group.shef.ac.uk

2 code runs  Consider one input and one output  Emulator estimate interpolates data  Emulator uncertainty grows between data points mucm.group.shef.ac.uk6/9/20095

3 code runs  Adding another point changes estimate and reduces uncertainty mucm.group.shef.ac.uk6/9/20096

5 code runs  And so on mucm.group.shef.ac.uk6/9/20097

MUCM  The emulator is a fast meta-model but with a full statistical representation of uncertainty  We can build the emulator and use it for tasks such as calibration with far fewer model runs than other methods  Typically 10 or 100 times fewer  The RCUK Basic Technology grant Managing Uncertainty in Complex Models is developing this approach   See in particular the MUCM toolkit 6/9/20098mucm.group.shef.ac.uk

Validation  What does it mean to validate a simulation model?  Compare model predictions with reality  But the model is always wrong  How can something which is always wrong ever be called valid?  Conventionally, a model is said to be valid if its predictions are close enough to reality  How close is close enough?  Depends on purpose  Conventional approaches to validation confuse the absolute (valid) with the relative (fit for this purpose)  Let’s look at an analogous validation problem 6/9/20099mucm.group.shef.ac.uk

Validating an emulator 6/9/2009mucm.group.shef.ac.uk10  What does it mean to validate an emulator?  Compare the emulator’s predictions with the reality of model output  Make a validation sample of runs at new input configurations  The emulator mean is the best prediction and is always wrong  But the emulator predicts uncertainty around that mean  The emulator is valid if its expressions of uncertainty are correct  Actual outputs should fall in 95% intervals 95% of the time  No less and no more than 95% of the time  Standardised residuals should have zero mean and unit variance  See Bastos and O’Hagan preprint on MUCM website

Validation diagnostics 6/9/2009mucm.group.shef.ac.uk11

Validating the model 6/9/2009mucm.group.shef.ac.uk12  Let’s accept that there is uncertainty around model predictions  We need to be able to make statistical predictions  Then if we compare with observations we can see whether reality falls within the prediction bounds correctly  The difference between model output and reality is called model discrepancy  It’s also a function of the inputs  Like the model output, it’s typically a smooth function  Like the model output, we can emulate this function  We can validate this

Model discrepancy 6/9/2009mucm.group.shef.ac.uk13  Model discrepancy was first introduced within the MUCM framework in the context of model calibration  Ignoring discrepancy leads to over-fitting and over-confidence in the calibrated parameters  Understanding that it is a smooth error term rather than just noise is also crucial  To learn about discrepancy we need a training sample of observations of the real process  Then we can validate our emulation of reality using further observations  This is one ongoing strand of the MUCM project

Beyond validation 6/9/2009mucm.group.shef.ac.uk14  An emulator (of a model or of reality) can be valid and yet useless in practice  Given a sample of real-process observations, we can predict the output at any input to be the sample mean plus or minus two sample standard deviations  This will validate OK  Assuming the sample is representative  But it ignores the model and makes poor use of the sample!  Two valid emulators can be compared on the basis of the variance of their predictions  And declared fit for purpose if the variance is small enough

In conclusion 6/9/2009mucm.group.shef.ac.uk15  I think it is useful to separate the absolute property of validity from the relative property of fitness for purpose  Model predictions alone are useless without some idea of how accurate they are  Quantifying uncertainty in the predictions by building an emulator allows us to talk about validity  Only valid statistical predictions of reality should be accepted  Model predictions with a false measure of their accuracy are also useless!  We can choose between valid predictions on the basis of how accurate they are  And ask if they are sufficiently accurate for purpose

Advertisement 6/9/2009mucm.group.shef.ac.uk16  Workshop on emulators and MUCM methods  “Uncertainty in Simulation Models”  Friday 10th July 2009  10.30am - 4pm  National Oceanography Centre Southampton   Please register with Katherine Jeays-Ward  by 3rd July 2009  Registration is free, and lunch/refreshments will be provided