Presentation is loading. Please wait.

Presentation is loading. Please wait.

Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield.

Similar presentations


Presentation on theme: "Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield."— Presentation transcript:

1 Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield

2 Part 1: Getting started  Starting GEM-SA program  Creating input and output files  Explanation of the menus, toolbars, etc.  Description of the project window

3 Starting GEM-SA  Double-click the GEM-SA icon to start  The main window appears, with – Menu – Toolbar – Sensitivity analysis output grid  Tab windows for other types of output – Log window

4 menu Log window toolbar Sensitivity analysis output grid

5 Toolbar icons New project Open project Save project Print output report Edit project Generate input design points Rescale an input Standardise design Copy input design to clipboard Convert input to integer Run the analysis Help

6 Sensitivity analysis output grid  This will report the sensitivity results after the analysis is complete – One line for each input parameter – One line for each pair of inputs, if joint effects are selected

7 Log Window output  Tells us – Which training data are being loaded/saved – Transformations applied to the data – Fitted Gaussian process parameters – Summary of uncertainty analysis results

8 Creating a GEM project  To build the emulator we first need 3 files: – Data file of code inputs – Data file of code outputs – GEM-SA project file

9 Restrictions on input/output data  Single output – Multiple outputs must be treated individually – GEM can read multiple outputs file, but a single column is specified within a project  Max 30 input parameters  Max 400 training points  The data files should be plain text files – One line for each point – Input file can be space or tab delimited

10 Generating a new input design  Designs can be generated using the toolbar icon or the menu: Input  Generate…  The design dialog appears

11 Generating a new input design  Click OK and fill in the required range for each input  Click OK again

12 Editing input designs  If you select a column, you can rescale values of that input or round values to be integers  Designs can be loaded into or saved from this window using the Inputs menu. Use to copy the points to the clipboard for use in other programs

13 Types of design  GEM-SA can generate 2 types of design – LP-  – Maximin Latin Hypercube designs  Both have good space-filling properties – Ensure all regions of the input space are well represented  LP-  quick to generate, good for increasing input design sequentially  MmLH can be better in high dimensions

14 Creating output data from these inputs  Each row from the input design must be used to generate a single output, e.g. using – Spreadsheet  Simple, but requires functional form – Script  Only need executable code  Loop through inputs, modify code input file – Modify code to loop through the points  Can be difficult, need source code

15 Example: using a spreadsheet  Copy the input design to the clipboard using  Open Excel and paste inputs  Create formula in final column  Copy formula for all rows of the design  Cut and paste special (values) in a new sheet  Save as text file

16 Example: using a script  Read base input file (read by executable code)  Loop through lines of input design file – Replace selected inputs in base input file – Run executable code with new input file – Calculate single output and add to training output file

17 The project window  Appears whenever you – Load a project – Edit a project – Create new project  This window has 3 tabs – Files – Options – Simulations

18 Names for the input files Names for the output files

19 How many inputs? What are the input names? Which column from output file?

20 Which joint effects should be calculated? What should be calculated, and how?

21 Are the inputs uncertain? What prior mean for the output?

22 What kind of prediction? What kind of cross validation?

23 MCMC control parameters How many points used to calculate main effects, joint effects How many realisations of predictions, main and joint effects to generate

24 Input parameter names  This window appears if you press the Names… button – Giving names is optional, but useful later when looking at GEM-SA output – Ordering can be changed using the arrows

25 Selecting joint effects  If you select calculate joint effects, individual items in the joint effects window can be highlighted for inclusion in joint effect calculations  Need to unselect the default all inputs first – Unless you want to consider all pairs

26 Other checkboxes  Sum effects – Use this if you want main effects of the 2 inputs to be included in the realisations of the joint effect of a pair

27 Other checkboxes  Code has numerical error – Use this if your code has numerical errors or a stochastic component which you want to smooth out – The variance of the error will be estimated as part of the fitting process – Can make the fitting process quite unstable, so avoid if possible!

28 Other checkboxes  Use MCMC for emulator parameters – For serious Bayesians only! – Takes into account uncertainty in the fitting of the emulator – Slows down the computation substantially, often with minimal effect on the results  Auto-tune Metropolis algorithm – Use only with MCMC

29 Input uncertainty options  All unknown, product normal – Inputs are independent, normally distributed  All unknown, uniform – Inputs are independent, distributed uniformly between the min and max values of the training data  All known – No uncertainty analysis required

30 Input uncertainty options  Some known, rest product normal – Some input values will be fixed (in the dialog window or in a prediction file) – Others will be given normal input parameters  Some known, rest uniform – As above, but unknown inputs have uniform distributions

31 Prior mean options  If you believe the output is roughly linear function of its inputs, select ‘linear term for each input’ – Otherwise a single value will be used to represent the prior overall level of the output

32 Input uniform parameters  Window appears if you click OK having selected uniform inputs  Select ‘Defaults from input ranges’ to use ranges from input file, if already specified

33 Input fixed and normal parameters  Window appears if you click OK having selected some fixed inputs, rest normal  For fixed inputs, tick the box and enter the fixed value in the first test box

34 Selecting prediction type  Predictions can be – Correlated realisations of outputs at the prediction inputs  Similar to main effect outputs – Marginal means and variances of outputs at the prediction inputs  Faster to compute, especially with many prediction points  Easy to interpret

35 Part 2 Uncertainty Analysis Using GEM-SA

36 Part 2: Outline  Setting up the project  Running a simple analysis  More complex analyses

37 Setting up the project

38 Create a new project  Select Project -> New, or click toolbar icon  Project dialog appears  We’ll specify the data files first

39 Files  The “Inputs” file contains one column for each parameter and one row for each model training run (the design)  The “Outputs” file contains the outputs from those runs (one column, in this examle)  Using “Browse” buttons, select input and output files

40 Our example  We’ll use the example “model1” in the GEM-SA DEMO DATA directory  This example is based on a vegetation model with 7 inputs – RESAEREO, DEFLECT, FACTOR, MO, COVER, TREEHT, LAI  The model has 16 outputs, but for the present we will consider output 4 – June monthly GPP

41 Number of inputs  Click on Options tab  Select number of inputs using or click “From Inputs File”

42 Define input names  Click on “Names …”  Enter parameter names  Click “OK”  The “Input parameter names” dialog opens

43 Complete the project  We will leave all other settings at their default values for now  Click “OK”  The Input Parameter Ranges window appears

44 Close and save project  Click “Defaults from input ranges” button  Click “OK”  Select Project -> Save – Or click toolbar icon  Choose a name and click “Save”

45 Running a simple analysis

46 Build the emulator  Click to build the emulator  A lot of things now start to happen! – The log window at the bottom starts to record various bits of information – A little window appears showing progress of minimisation of the roughness parameter estimation criterion – A new window appears in the “Main Effects” tab and several graphs appear  Progress bar at the bottom

47 Focus on the log window  Ignore the outputs in the “Main Effects” and “Sensitivity Analysis” windows for now – These will be explained later  Focus on the log window  This reports two key things – Diagnostics of the emulator build – The basic uncertainty analysis results  These also appear in the “Output Summary” window and can be printed using

48 Emulation diagnostics  Note where the log window reports …  The first line says roughness parameters have been estimated by the simplest method  The values of these indicate how non-linear the effect of each input parameter is – Note the high value for input 4 (MO) Estimating emulator parameters by maximising probability distribution... maximised posterior for emulator parameters: sigma-squared = 0.342826, roughness = 0.217456 0.0699709 0.191557 16.9933 0.599439 0.459675 1.01559

49 Uncertainty analysis – mean  Below this, the log reports  So the best estimate of the output (June GPP) is 24.1 (mol C/m 2 ) – This is averaged over the uncertainty in the 7 inputs  Better than just fixing inputs at best estimates – There is an emulation standard error of 0.062 in this figure Estimate of mean output is 24.145, with variance 0.00388252

50 Uncertainty analysis – variance  The final line of the log is  This shows the uncertainty in the model output that is induced by input uncertainties – The variance is 73.9 – Equal to a standard deviation of 8.6 – So although the best estimate of the output is 24.3, the uncertainty in inputs means it could easily be as low as 16 or as high as 33 Estimate of total output variance = 73.9033

51 More complex analyses

52 Input distributions  A normal (gaussian) distribution is generally a more realistic representation of uncertainty – Range unbounded – More probability in the middle  Default is to assume the uncertainty in each input is represented by a uniform distribution – Range determined by the range of values found in the input file, or input manually

53 Changing input distributions  In Project dialog, Options tab, click the button for “All unknown, product normal”  Then OK  A new dialog opens to specify means and variances

54 Model 1 example  Uniform distributions from input ranges  Normal distributions to match – Range is 4 std devs  Except for MO – Narrower distribution UniformNormal ParameterLowerUpperMeanVariance RESAEREO80200140900 DEFLECT0.610.80.01 FACTOR0.10.50.30.01 MO3010060100 COVER0.60.990.80.01 TREEHT104025100 LAI3.7596.51

55 Effect on UA  After running the revised model, we see: – It runs faster, with no need to rebuild the emulator – The mean is changed a little and variance is halved The emulator fit is unchanged Estimate of mean output is 26.2698, with variance 0.00784475 Estimate of total output variance = 38.1319

56 Reducing the MO uncertainty further  If we reduce the variance of MO even more, to 49: – UA mean changes a little more and variance reduces again – Notice also how the emulation uncertainty has increased (0.004 for uniform) – This is because the design points cover the new ranges less thoroughly Estimate of mean output is 26.3899, with variance 0.0108792 Estimate of total output variance = 27.1335

57 Cross-validation  In the Project dialog, look at the bottom menu box, labelled “Cross-validation”  There are 3 options – None – Leave-one-out – Leave final 20% out  CV is a way of checking the emulator fit – Default is None because CV takes time

58 Cross Validation Root Mean-Squared Error = 0.907869 Cross Validation Root Mean-Squared Relative Error = 4.34773 percent Cross Validation Root Mean-Squared Standardised Error = 1.15273 Largest standardised error is 4.32425 for data point 61 Cross Validation variances range from 0.18814 to 3.92191 Written cross-validation means to file cvpredmeans.txt Written cross-validation variances to file cvpredvars.txt Leave-one-out CV  After estimating roughness and other parameters, GEM predicts each training run point using only the remaining n-1 points  Results appear in log window Close to 1

59 Leave out final 20% CV  This is an even better check, because it tests the emulator on data that have not been used in any way to predict it  Emulator is built on first 80% of data and used to predict last 20% Cross Validation Root Mean-Squared Error = 1.46954 Cross Validation Root Mean-Squared Relative Error = 7.4922 percent Cross Validation Root Mean-Squared Standardised Error = 1.73675 Largest standardised error is 5.05527 for data point 22 Cross Validation variances range from 0.277304 to 4.88653

60 Other options  There are various other options associated with the emulator building that we have not dealt with  But we’ve done the main things that should be considered in practice  And it’s enough to be going on with!

61 When it all goes wrong  How do we know when the emulator is not working? – Large roughness parameters  Especially ones hitting the limit of 99 – Large emulation variance on UA mean – Poor CV standardised prediction error  Especially when some are extremely large  In such cases, see if a larger training set helps – Other ideas like transforming output scale

62 Part 3 Sensitivity Analysis in GEM-SA

63 Example  Again we use the ForestETP vegetation model – 7 input parameters – 120 model runs  Objective: conduct a variance-based sensitivity analysis to identify which uncertain inputs are driving the output uncertainty.

64 Exploratory scatter plots

65 Sensitivity Analysis Walkthrough 1.  Project  New 2.Click “Browse” for the Inputs File – From the GEM-SA Demo Data/Model1/ folder, select “emulator7x120inputs.txt” 3.Click “Browse” for the Outputs File – From the GEM-SA Demo Data/Model1/ folder, select “out11.txt” 4.Select the Options tab

66 Sensitivity Analysis Walkthrough 5.Change the Number of Inputs to 7. 6.Leave the other options unchanged – Input uncertainty options: All unknown, uniform – Prior mean options: Linear term for each input – Generate predictions as: function realisations (correlated points)

67 Sensitivity Analysis Walkthrough

68 7.Click OK 8.Select “Default from input ranges” then OK 9.  Project  Run or use

69 Main effect plots

70 Fixing X 6 = 18, this point shows the expected value of the output (obtained by averaging over all other inputs). Simply fixing all the other inputs at their central values and comparing X 6 =10 with X 6 =40 would underestimate the influence of this input (The thickness of the band shows emulator uncertainty) X6

71 Variance of main effects Main effects for each input. Input 6 has the greatest individual contribution to the variance Main effects sum to 66.8% of the total variance

72 Interactions and total effects  Main effects explain 2/3 of the variance – Model must contain interactions  Any input can have small main effect, but large interaction effect, so overall this input is still ‘important’  Can ask GEM-SA to compute all pair-wise interaction effects – 435 in total for a 30 input model – can take some time!  Useful to know what to look for

73 Interactions and total effects  For each input X i Total effect of X i = main effect for X i + all interactions involving X i  Total effect >> main effect implies interactions in the model  So for any input with large total effect relative to the main effect – investigate possible interactions involving that input

74 Interactions and total effects Total effects for inputs 4 and 7 much larger than its main effect. Implies presence of interactions

75 Interaction effects 10.  Project  Edit or 11.In Options tab, tick calculate joint effects 12.De-select all inputs under “Inputs to include in joint effects”, then select X4, X5, X6, X7

76 Interaction effects 13.Click OK, then OK again 14.  Project  Run or

77 Interaction effects Note interactions involving inputs 4 and 7 Main effects and selected interactions now sum to almost 92% of the total variance

78 Exercise 1.Set up a new project using SAex1_inputs.txt for the inputs and SAex1_outputs.txt for the output – 8 input parameters (uniform on [0,1]) – 100 model runs 2.Estimate the main effects only for this model and identify the influential input variables 3.By comparing main effects with total effects, can you spot any interactions? 4.Estimate any suspected interactions to test your intuition!


Download ppt "Getting started with GEM-SA Marc Kennedy Central Science Laboratory, York Tony O’Hagan, Jeremy Oakley University of Sheffield."

Similar presentations


Ads by Google