Download presentation

Presentation is loading. Please wait.

Published byAmara Meek Modified over 2 years ago

1
Slide 1 John Paul Gosling University of Sheffield Uncertainty and Sensitivity Analysis of Complex Computer Codes

2
mucm.group.shef.ac.ukSlide 2 Outline Uncertainty and computer models Uncertainty analysis (UA) Sensitivity analysis (SA)

3
mucm.group.shef.ac.ukSlide 3 Why worry about uncertainty? How accurate are model predictions? There is increasing concern about uncertainty in model outputs Particularly where model predictions are used to inform scientific debate or environmental policy Are model predictions robust enough for high stakes decision-making?

4
mucm.group.shef.ac.ukSlide 4 For instance… Models for climate change produce different predictions for the extent of global warming or other consequences Which ones should we believe? What error bounds should we put around these? Are model differences consistent with the error bounds? Until we can answer such questions convincingly, decision makers can continue to dismiss our results

5
mucm.group.shef.ac.ukSlide 5 Where is the uncertainty? Several principal sources of uncertainty Accuracy of parameters in model equations Accuracy of data inputs Accuracy of the model in representing the real phenomenon, even with accurate values for parameters and data In this section, we will be concerned with the first two

6
mucm.group.shef.ac.ukSlide 6 Inputs We will interpret “inputs” widely Initial conditions Other data defining the particular context being simulated Forcing data (e.g. rainfall in hydrology models) Parameters in model equations These are often hard-wired (which is a problem!)

7
mucm.group.shef.ac.ukSlide 7 Input uncertainty We are typically uncertain about the values of many of the inputs Measurement error Lack of knowledge Parameters with no real physical meaning However, we must have beliefs about the parameters. The elicitation of these beliefs must be done in a careful manner as there will often be no data to contradict or support them.

8
mucm.group.shef.ac.ukSlide 8 Output Uncertainty Input uncertainty induces uncertainty in the output y It also has a probability distribution In theory, this is completely determined by the probability distribution on x and the model f In practice, finding this distribution and its properties is not straightforward

9
mucm.group.shef.ac.ukSlide 9 A trivial model Suppose we have just two inputs and a simple linear model y = x 1 + 3*x 2 Suppose that x 1 and x 2 have independent uniform distributions over [0, 1] i.e. they define a point that is equally likely to be anywhere in the unit square Then we can determine the distribution of y exactly

10
mucm.group.shef.ac.ukSlide 10 A trivial model – y’s distribution The distribution of y has this trapezium form

11
mucm.group.shef.ac.ukSlide 11 A trivial model – y’s distribution If x 1 and x 2 have normal distributions (x 1, x 2 ~N(0.5, )), we get a normal output

12
mucm.group.shef.ac.ukSlide 12 A slightly less trivial model Now consider the simple nonlinear model y = sin(x 1 )/{1+exp(x 1 +x 2 )} We still have only 2 inputs and quite a simple equation But even for nice input distributions, we cannot get the output distribution exactly The simplest way to compute it would be by Monte Carlo

13
mucm.group.shef.ac.ukSlide 13 Monte Carlo output distribution This is for the normal inputs 10,000 random normal pairs were generated and y calculated for each pair

14
mucm.group.shef.ac.ukSlide 14 Uncertainty analysis (UA) The process of characterising the distribution of the output y is called uncertainty analysis Plotting the distribution is a good graphical way to characterise it Quantitative summaries are often more important Mean, median Standard deviation, quartiles Probability intervals

15
mucm.group.shef.ac.ukSlide 15 UA of slightly nonlinear model Mean = 0.117, median = Std. dev. = % range (quartiles) = [0.093, 0.148] 95% range = [0.002, 0.200]

16
mucm.group.shef.ac.ukSlide 16 UA versus plug-in Even if we just want to estimate y, UA does better than the “plug-in” approach of running the model for estimated values of x For the simple nonlinear model, the central estimates of x 1 and x 2 are 0.5, but sin(0.5)/(1+exp(1)) = is a slightly too high estimate of y compared with the mean of or median of The difference can be much more marked for highly nonlinear models

17
mucm.group.shef.ac.ukSlide 17 Summary Why UA? Proper quantification of output uncertainty Need proper probabilistic expression of input uncertainty Improved central estimate of output Better than the usual plug-in approach

18
mucm.group.shef.ac.ukSlide 18 Which inputs affect output most? This is a common question Sensitivity analysis (SA) attempts to address it There are various forms of SA The methods most frequently used are not the most helpful!

19
mucm.group.shef.ac.ukSlide 19 Local sensitivity analysis To measure the sensitivity of y to input x i, compute the derivative of y with respect to x i Nonlinear model: At x 1 = x 2 = 0.5, the derivatives are wrt x 1, 0.142; wrt x 2, –0.094 What does this tell us?

20
mucm.group.shef.ac.ukSlide 20 Local SA – deficiencies Derivatives evaluated at the central estimate Could be quite different at other points nearby Doesn’t capture interactions between inputs E.g. sensitivity of y to increasing both x 1 and x 2 could be greater or less than the sum of their individual sensitivities Not invariant to change of units

21
mucm.group.shef.ac.ukSlide 21 One-way SA Vary inputs one at a time from central estimate Nonlinear model: Vary x 1 to 0.25, 0.75, output is 0.079, Vary x 2 to 0.25, 0.75, output is 0.154, Is this really a good idea?

22
mucm.group.shef.ac.ukSlide 22 One-way SA – deficiencies Depends on how far we vary each input Relative sensitivities of different inputs change if we change the ranges Also fails to capture interactions Statisticians have known for decades that varying factors one at a time is bad experimental design!

23
mucm.group.shef.ac.ukSlide 23 Multi-way SA Vary factors two or more at a time Maybe statistical factorial design Full factorial designs require very many runs Can find interactions but hard to interpret Often just look for the biggest change of output among all runs Still dependent on how far we vary each input

24
mucm.group.shef.ac.ukSlide 24 Probabilistic SA (PSA) Inputs varied according to their probability distributions As in UA Gives an overall picture and can identify interactions

25
mucm.group.shef.ac.ukSlide 25 Variance decomposition One way to characterise the sensitivity of the output to individual inputs is to compute how much of the UA variance is due to each input For the simple non-linear model, we have InputContribution X % X % X1.X2 interaction2.93 %

26
mucm.group.shef.ac.ukSlide 26 Main effects We can also plot the effect of varying one input averaged over the others Nonlinear model Averaging y = sin(x 1 )/{1+exp(x 1 +x 2 )} with respect to the uncertainty in x 2, we can plot it as a function of x 1 Similarly, we can plot it as a function of x 2 averaged over uncertainty in x 1 We can also plot interaction effects

27
mucm.group.shef.ac.ukSlide 27 Nonlinear example – main effects Red is main effect of x 1 (averaged over x 2 ) Blue is main effect of x 2 (averaged over x 1 )

28
mucm.group.shef.ac.ukSlide 28 Summary Why SA? For the model user: identifies which inputs it would be most useful to reduce uncertainty about For the model builder: main effect and interaction plots demonstrate how the model is behaving Sometimes surprisingly!

29
mucm.group.shef.ac.ukSlide 29 What’s this got to do with emulators? Computation of UA and (particularly) SA by conventional methods (like Monte Carlo) can be an enormous task for complex environmental models Typically at least 10,000 model runs needed Not very practical when each run takes 1 minute (a week of computing) And out of the question if a run takes 30 minutes Emulators use only a fraction of model runs, and their probabilistic framework helps keep track of all the uncertainty.

30
mucm.group.shef.ac.ukSlide 30 What’s to come? GEM-SA is the first stage of the GEM project GEM = “Gaussian Emulation Machine” It uses highly efficient emulation methods based on Bayesian statistics The fundamental idea is that of “emulating” the physical model by a statistical representation called a Gaussian process GEM-SA does UA and SA Future stages of GEM will add more functionality

31
mucm.group.shef.ac.ukSlide 31 References There are two papers that cover the material in these slides: Oakley and O’Hagan (2002). Bayesian inference for the uncertainty distribution of computer model outputs, Biometrika, 89, 769—784. Oakley and O’Hagan (2004). Probabilistic sensitivity analysis of complex models: a Bayesian approach, J. R. Statist. Soc. B, 66, 751—769.

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google