8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

Slides:



Advertisements
Similar presentations
Rachel T. Johnson Douglas C. Montgomery Bradley Jones
Advertisements

Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.
Southampton workshop, July 2009Slide 1 Tony O’Hagan, University of Sheffield Simulators and Emulators.
Quantifying and managing uncertainty with Gaussian process emulators Tony O’Hagan University of Sheffield.
Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x.
SAMSI Distinguished, October 2006Slide 1 Tony O’Hagan, University of Sheffield Managing Uncertainty in Complex Models.
Durham workshop, July 2008Slide 1 Tony O’Hagan, University of Sheffield MUCM: An Overview.
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Getting started with GEM-SA Marc Kennedy. This talk  Starting GEM-SA program  Creating input and output files  Explanation of the menus, toolbars,
Running a model's adjoint to obtain derivatives, while more efficient and accurate than other methods, such as the finite difference method, is a computationally.
Designing Ensembles for Climate Prediction
Model Assessment and Selection
Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.
Gaussian Processes I have known
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.
Sensitivity Analysis for Complex Models Jeremy Oakley & Anthony O’Hagan University of Sheffield, UK.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Helsinki University of Technology Adaptive Informatics Research Centre Finland Variational Bayesian Approach for Nonlinear Identification and Control Matti.
Session 3: Calibration Using observations of the real process.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Gaussian process modelling
Calibration and Model Discrepancy Tony O’Hagan, MUCM, Sheffield.
Calibration of Computer Simulators using Emulators.
6 July 2007I-Sim Workshop, Fontainebleau1 Simulation and Uncertainty Tony O’Hagan University of Sheffield.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
29 May 2008IMA Scottish Branch1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan University of Sheffield.
Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x?
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Quantifying uncertainty in the.
Emulation and Sensitivity Analysis of the CMAQ Model During a UK Ozone Pollution Episode Andrew Beddows Environmental Research Group King’s College London.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Introduction to emulators Tony O’Hagan University of Sheffield.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Marc Kennedy, Tony O’Hagan, Clive Anderson,
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Non-Parametric Models
Neural Networks for Machine Learning Lecture 1e Three types of learning Geoffrey Hinton with Nitish Srivastava Kevin Swersky.
Bayes Net Learning: Bayesian Approaches
Reading: Pedro Domingos: A Few Useful Things to Know about Machine Learning source: /cacm12.pdf reading.
Course: Autonomous Machine Learning
CSCI 5822 Probabilistic Models of Human and Machine Learning
Introduction to particle filter
Multidimensional Integration Part I
Introduction to particle filter
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Chapter 8: Estimating with Confidence
20 times 80 is enough Ben van Hout 10/4/19
Chapter 8: Estimating with Confidence
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Uncertainty Propagation
Sampling Plans.
Presentation transcript:

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield

2 Outline 1. Computer codes and their problems 2. Gaussian process representation 3. Design 4. Conclusions

3 Models and uncertainty In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real- world processes For understanding, prediction, control Growing realisation of importance of uncertainty in model predictions Can we trust them? Without any quantification of output uncertainty, it’s easy to dismiss them

4 Computer codes A computer code is a software implementation of a mathematical model for some real process Given suitable inputs x that define a particular instance, the code output y = f(x) predicts the true value of that real process A single run of the model can take an appreciable amount of time In some cases, months! Even a few seconds can be too long for tasks that require many thousands of runs

5 What are models for? Prediction and optimisation What will the model output be for these inputs? What inputs will optimise the output? Uncertainty analysis Given uncertainty in model inputs, how uncertain are outputs? Which input uncertainties are most influential? Calibration and data assimilation How can we use data to improve the model? Many of these tasks implicitly require many model runs

6 Computation Consider uncertainty analysis Given uncertain input X, what can we say about the distribution of output Y = f(X)? Monte Carlo is the simplest method Sample x 1, x 2, …, x N from distribution of X Run model to get outputs y 1, y 2, …, y N Use this as a sample of the output distribution Easy to implement but impractical if model takes more than a few seconds to run 10,000 minutes is a week

7 Gaussian process representation More efficient approach First work in early 1980s – DACE Represent the code as an unknown function f(.) becomes a random process We represent it as a Gaussian process Training runs Run model for sample of x values Condition GP on observed data Typically requires many fewer runs than MC And x values don’t need to be chosen randomly

8 Bayesian formulation Prior beliefs about function conditional on hyperparameters Data Posterior beliefs about function conditional on hyperparameters

9 Emulation Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters Roughness parameters in B crucial The posterior distribution is known as an emulator of the computer code Posterior mean estimates what the code would produce for any untried x (prediction) With uncertainty about that prediction given by posterior variance Correctly reproduces training data

code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

code runs Adding another point changes estimate and reduces uncertainty

code runs And so on

13 Frequentist formulation Pretend the function is actually sampled from a Gaussian process population of functions Absurd, really! But properties of inferences depend on it Best linear unbiased predictor is the same as Bayesian posterior mean With weak prior distributions Similarly for variances

14 Then what? Use the emulator to make inference about other things of interest E.g. uncertainty analysis, calibration Conceptually very straightforward in the Bayesian framework But of course can be computationally hard Frequentist approach has not generally been extended to some of the more complex analyses

15 Design The design problem is to choose x 1, x 2, …, x N Design space  is usually rectangular Often rather arbitrary May be high dimensional Objective is to build an accurate emulator across  Formally optimising for some specific analysis is generally inappropriate (and too hard) Usual approach is to aim for a design that fills  uniformly Minimises uncertainty between design points

16 Latin hypercubes LH designs Divide the range of each variable into N equal segments Choose a value in each segment (uniformly) Permute each coordinate randomly Covers each coordinate evenly Maximin LH Generate many LH designs Choose one for which minimum distance between points is greatest

17

18

19 Projection Projections of LH designs onto lower dimensional spaces are also LH designs Not necessarily maximin, but usually quite even Important because typically only a few inputs are influential There are other ways of generating space- filling designs Low discrepancy sequences Don’t necessarily have good projections

20 Other considerations Maximin LH designs don’t have points close together By definition! But such pairs help to identify hyperparameters Particularly roughness parameters Maybe add extra points differing from existing ones only by a small amount in one dimension Sequential designs would be very helpful Low discrepancy sequences Adaptive designs for partitioned emulators

21 Some design challenges Space filling designs that are good in all projections Understanding the value of low-distance pairs Designs for non-rectangular or unbounded  Sequential/adaptive design E.g. a good 150-point design with a good 100- point subset Adaptation to roughnesses and heterogeneity Design of real-world experiments for calibration

22 MUCM This is a substantial and topical research area MUCM (Managing Uncertainty in Complex Models) is a new £2M research project Funded by RCUK Basic Technology scheme 4 year grant, 7 RAs + 4 PhDs in 5 centres Henry Wynn (LSE) leading design work But enough problems for lots of people to work on! mucm.group.shef.ac.uk Year-long programme at SAMSI (USA)