Presentation on theme: "Uncertain models and modelling uncertainty"— Presentation transcript:
1Uncertain models and modelling uncertainty Marian ScottDept of Statistics, University of GlasgowEMS workshop, Nottingham, April 2004
2Outline of presentation Model building and testing- is the environment special?Statistical models vs physical/process based modelsWhat is sensitivity/uncertainty analysis?Quantifying and apportioning variation in model and data.General comments- relevance and implementation.
3(All data are useful, but some are more varied than others.) All models are wrong but some are useful (and some are more useful than others)(All data are useful, but some are more varied than others.)
4Questions we ask about models Is the model valid?Are the assumptions reasonable?Does the model make sense based on best scientific knowledgeIs the model credible?Do the model predictions match the observed data?How uncertain are the results?What is a good model?Simple, realistic, efficient, useful, reliable, valid etc
5Statistical modelsAlways includes an term to describe random variationEmpiricalDescriptive and predictiveModel building goal: simplest model which is adequateused for inference
6Physical/process based models Uses best scientific knowledgeMay not explicitly include , or any random variationDescriptive and predictiveGoal may not be simplest modelNot used for inference
7Models Mathematical (deterministic/process based) models tend to be complexto ignore important sources of uncertaintyStatistical models tendto be empiricalTo ignore much of the biological/physical/chemical knowledge
8Stages in modelling Design and conceptualisation: Visualisation of structureIdentification of processes (variable selection)Choice of parameterisationFitting and assessmentparameter estimation (calibration)Goodness of fit
9Model evaluation tools Graphical procedures% variation explained in responseStatistical model comparisons (F-tests, ANOVA, GLRT)well designed for statistical models, but what of the physical, process-driven models?Comparability to measurements
10The story of randomness and uncertainty Randomness as the source of variabilityA source of variation, different animals range over different territory, eat different sources of ….The effect is that we cannot be certainUncertainty due to lack of knowledgeconflicting evidenceignoranceeffects of scalelack of observationsUncertainty due to variabilityNatural randomnessbehavioural variability
11Effect of uncertainties Uncertainty in model quantities/parameters/inputsUncertainty about model formUncertainty about model completenessLack of observations contribute touncertainties in input dataparameter uncertaintiesConflicting evidence contributes touncertainty about model formUncertainty about validity of assumptionsMaking it difficult to judge how good a model is!!
12Modelling tools - SA/UA Sensitivity analysis determining the amount and kind of change produced in the model predictions by a change in a model parameter Uncertainty analysis an assessment/quantification of the uncertainties associated with the parameters, the data and the model structure.
13Modellers conduct SA to determine (a) if a model resembles the system or processes under study,(b) the factors that mostly contribute to the output variability,(c) the model parameters (or parts of the model itself) that are insignificant,(d) if there is some region in the space of input factors for which the model variation is maximum,and(e) if and which (group of) factors interact with each other.
15Design of the SA experiment Simple factorial designs (one at a time)Factorial designs (including potential interaction terms)Fractional factorial designsImportant difference: design in the context of computer code experiments – random variation due to variation in experimental units does not exist.
16SA techniques Screening techniques Local/differential analysis O(ne) A(t) T(ime), factorial, fractional factorial designs used to isolate a set of important factorsLocal/differential analysisSampling-based (Monte Carlo) methodsVariance based methodsvariance decomposition of output to compute sensitivity indices
17Screeningscreening experiments can be used to identify the parameter subset that controls most of the output variability with low computational effort.
18Screening methodsVary one factor at a time (NOT particularly recommended)Morris OAT design (global)Estimate the main effect of a factor by computing a number r of local measures at different points x1,…,xr in the input space and then average them.Order the input factors
19Local SALocal SA concentrates on the local impact of the factors on the model. Local SA is usually carried out by computing partial derivatives of the output functions with respect to the input variables.The input parameters are varied in a small interval around a nominal value. The interval is usually the same for all of the variables and is not related to the degree of knowledge of the variables.
20Global SAGlobal SA apportions the output uncertainty to the uncertainty in the input factors, covering their entire range space.A global method evaluates the effect of xj while all other xi,ij are varied as well.
21How is a sampling (global) based SA implemented? Step 1: define model, input factors and outputsStep 2: assign p.d.f.’s to input parameters/factors and if necessary covariance structure. DIFFICULTStep 3:simulate realisations from the parameter pdfs to generate a set of model runs giving the set of output values.
22Choice of sampling method S(imple) or Stratified R(andom) S(ampling)Each input factor sampled independently many times from marginal distbns to create the set of input values (or randomly sampled from joint distbn.)Expensive (relatively) in computational effort if model has many input factors, may not give good coverage of the entire range spaceL(atin) H(ypercube) S(sampling)The range of each input factor is categorised into N equal probability intervals, one observation of each input factor made in each interval.
23SA -analysisAt the end of the computer experiment, data is of the form (yij, x1i,x2i,….,xni), where x1,..,xn are the realisations of the input factors.Analysis includes regression analysis (on raw and ranked values), standard hypothesis tests of distribution (mean and variance) for sub-samples corresponding to given percentiles of x and Analysis of Variance.
24Some ‘new’ methods of analysis Measures of importanceVarXi(E(Y|Xj =xj))/Var(Y)HIM(Xj) =yiyi’/NSobol sensitivity indicesFourier Amplitude Sensitivity test (FAST)
25So far so goodbut how useful are these techniques in some real life problems?Are there other complicating factors?Do statisticians have too simple/complex a view of the world?
26Common features of environmental modelling and observations Knowledge of the processes creating the observational record may be incompleteThe observational records may be incomplete (observed often irregularly in space and time)involve extreme eventsinvolve quantification of risk
27Issues and purpose of analysis Global and local pollutant mapping from ChernobylGlobal carbon cycle – greenhouse gases, CO2 levels and global warmingOcean modellingAir pollution modelling (local and regional scale)Chronologies for past environment studiesDecision making- Which areas should be restricted?Prediction-What is the trend in temperature? Predict its level in 2050?Decision making-is it safe to eat fish?Regulatory- Have emission control agreements reduced air pollutants?Understanding -when did things happen in the past
28Questions we ask about observations Do they result from observational or designed; laboratory or field experiments?What scale are they collected over (time and space)?Are they representative?Are they qualitative or quantitative?How are they connected to processes, how well understood are these connections?How varied are they?
29Example 1: are atmospheric SO2 concentrations declining? Measurements made at a monitoring station over a 20 year period: processes involve meteorology (local and long-range, source distribution, chemistry of sulphur)Complex statistical model developed to describe the pattern, the model portions the variation to ‘trend’, seasonality, residual variationMain objective
33Example 2Discovery of radioactive particles on the foreshore of a nuclear facility since 1983Is the rate of finds falling off?Are the particle characteristics changing with time?Processes: transport in the marine environment, chemistry of the particles in the sea, interaction with sourceWhat can we infer about the size of the source and its distribution?
37Example 3: how well should models agree? 6 ocean models (process based-transport, sedimentary processes, numerical solution scheme, grid size) used to predict the dispersal of a pollutantResults to be used to determine a remediation policyThe models differ in their detail and also in their spatial scale
38Model agreementThree different sites (local, regional and global relative to a source)6 different modelsLevel of agreement (high values are poor).
39Predictions of levels of cobalt-60 Different models, same input dataPredictions vary by considerable marginsMagnitude of variation a function of spatial distribution of sites
40Environmental modelling Modelling may involveUnderstanding and handling variationDealing with unusual observationsDealing with missing observationsEvaluating uncertainties
41How well should the model reproduce the data? anecdotal comments ‘agreement between model and measurement better than 1 (2 ) orders of magnitude is acceptable’.But this needs to be moderated by the measurement variation and uncertaintiesIt also depends on the purpose (model fit for purpose)
42How can SA/UA help? SA/UA have a role to play in all modelling stages: We learn about model behaviour and ‘robustness’ to change;We can generate an envelope of ‘outcomes’ and see whether the observations fall within the envelope;We can ‘tune’ the model and identify reasons/causes for differences between model and observations
43On the other hand - Uncertainty analysis Parameter uncertaintyusually quantified in form of a distribution.Model structural uncertaintymore than one model may be fit, expressed as a prior on model structure.Scenario uncertaintyuncertainty on future conditions.
44Tools for handling uncertainty Parameter uncertaintyProbability distributions and Sensitivity analysisStructural uncertaintyBayesian frameworkone possibility to define a discrete set of models, other possibility to use a Gaussian process
45Conclusions The world is rich and varied in its complexity Modelling is an uncertain activityModel assessment is a difficult processSA/UA are an important tools in model assessmentThe setting of the problem in a unified Bayesian framework allows all the sources of uncertainty to be quantified, so a fuller assessment to be performed.
46Challenges Some challenges: different terminologies in different subject areas.need more sophisticated tools to deal with multivariate nature of problem.challenges in describing the distribution of input parameters.challenges in dealing with the Bayesian formulation of structural uncertainty for complex models.Computational challenges in simulations for large and complex computer models with many factors.