Presentation is loading. Please wait.

Presentation is loading. Please wait.

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

Similar presentations


Presentation on theme: "8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield."— Presentation transcript:

1 8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield

2 www.mucm.group.shef.ac.ukSlide 2 Outline 1. Computer codes and their problems 2. Gaussian process representation 3. Design 4. Conclusions

3 www.mucm.group.shef.ac.ukSlide 3 Models and uncertainty In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real- world processes For understanding, prediction, control Growing realisation of importance of uncertainty in model predictions Can we trust them? Without any quantification of output uncertainty, it’s easy to dismiss them

4 www.mucm.group.shef.ac.ukSlide 4 Computer codes A computer code is a software implementation of a mathematical model for some real process Given suitable inputs x that define a particular instance, the code output y = f(x) predicts the true value of that real process A single run of the model can take an appreciable amount of time In some cases, months! Even a few seconds can be too long for tasks that require many thousands of runs

5 www.mucm.group.shef.ac.ukSlide 5 What are models for? Prediction and optimisation What will the model output be for these inputs? What inputs will optimise the output? Uncertainty analysis Given uncertainty in model inputs, how uncertain are outputs? Which input uncertainties are most influential? Calibration and data assimilation How can we use data to improve the model? Many of these tasks implicitly require many model runs

6 www.mucm.group.shef.ac.ukSlide 6 Computation Consider uncertainty analysis Given uncertain input X, what can we say about the distribution of output Y = f(X)? Monte Carlo is the simplest method Sample x 1, x 2, …, x N from distribution of X Run model to get outputs y 1, y 2, …, y N Use this as a sample of the output distribution Easy to implement but impractical if model takes more than a few seconds to run 10,000 minutes is a week

7 www.mucm.group.shef.ac.ukSlide 7 Gaussian process representation More efficient approach First work in early 1980s – DACE Represent the code as an unknown function f(.) becomes a random process We represent it as a Gaussian process Training runs Run model for sample of x values Condition GP on observed data Typically requires many fewer runs than MC And x values don’t need to be chosen randomly

8 www.mucm.group.shef.ac.ukSlide 8 Bayesian formulation Prior beliefs about function conditional on hyperparameters Data Posterior beliefs about function conditional on hyperparameters

9 www.mucm.group.shef.ac.ukSlide 9 Emulation Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters Roughness parameters in B crucial The posterior distribution is known as an emulator of the computer code Posterior mean estimates what the code would produce for any untried x (prediction) With uncertainty about that prediction given by posterior variance Correctly reproduces training data

10 www.mucm.group.shef.ac.ukSlide 10 2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

11 www.mucm.group.shef.ac.ukSlide 11 3 code runs Adding another point changes estimate and reduces uncertainty

12 www.mucm.group.shef.ac.ukSlide 12 5 code runs And so on

13 www.mucm.group.shef.ac.ukSlide 13 Frequentist formulation Pretend the function is actually sampled from a Gaussian process population of functions Absurd, really! But properties of inferences depend on it Best linear unbiased predictor is the same as Bayesian posterior mean With weak prior distributions Similarly for variances

14 www.mucm.group.shef.ac.ukSlide 14 Then what? Use the emulator to make inference about other things of interest E.g. uncertainty analysis, calibration Conceptually very straightforward in the Bayesian framework But of course can be computationally hard Frequentist approach has not generally been extended to some of the more complex analyses

15 www.mucm.group.shef.ac.ukSlide 15 Design The design problem is to choose x 1, x 2, …, x N Design space  is usually rectangular Often rather arbitrary May be high dimensional Objective is to build an accurate emulator across  Formally optimising for some specific analysis is generally inappropriate (and too hard) Usual approach is to aim for a design that fills  uniformly Minimises uncertainty between design points

16 www.mucm.group.shef.ac.ukSlide 16 Latin hypercubes LH designs Divide the range of each variable into N equal segments Choose a value in each segment (uniformly) Permute each coordinate randomly Covers each coordinate evenly Maximin LH Generate many LH designs Choose one for which minimum distance between points is greatest

17 www.mucm.group.shef.ac.ukSlide 17

18 www.mucm.group.shef.ac.ukSlide 18

19 www.mucm.group.shef.ac.ukSlide 19 Projection Projections of LH designs onto lower dimensional spaces are also LH designs Not necessarily maximin, but usually quite even Important because typically only a few inputs are influential There are other ways of generating space- filling designs Low discrepancy sequences Don’t necessarily have good projections

20 www.mucm.group.shef.ac.ukSlide 20 Other considerations Maximin LH designs don’t have points close together By definition! But such pairs help to identify hyperparameters Particularly roughness parameters Maybe add extra points differing from existing ones only by a small amount in one dimension Sequential designs would be very helpful Low discrepancy sequences Adaptive designs for partitioned emulators

21 www.mucm.group.shef.ac.ukSlide 21 Some design challenges Space filling designs that are good in all projections Understanding the value of low-distance pairs Designs for non-rectangular or unbounded  Sequential/adaptive design E.g. a good 150-point design with a good 100- point subset Adaptation to roughnesses and heterogeneity Design of real-world experiments for calibration

22 www.mucm.group.shef.ac.ukSlide 22 MUCM This is a substantial and topical research area MUCM (Managing Uncertainty in Complex Models) is a new £2M research project Funded by RCUK Basic Technology scheme 4 year grant, 7 RAs + 4 PhDs in 5 centres Henry Wynn (LSE) leading design work But enough problems for lots of people to work on! mucm.group.shef.ac.uk Year-long programme at SAMSI (USA)


Download ppt "8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield."

Similar presentations


Ads by Google