Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian methods for calibrating and comparing process-based vegetation models Marcel van Oijen (CEH-Edinburgh)

Similar presentations


Presentation on theme: "Bayesian methods for calibrating and comparing process-based vegetation models Marcel van Oijen (CEH-Edinburgh)"— Presentation transcript:

1 Bayesian methods for calibrating and comparing process-based vegetation models Marcel van Oijen (CEH-Edinburgh)

2 ContentsContents 1.Process-based modelling of forests and uncertainties 2.Bayes Theorem (BT) 3.Bayesian Calibration (BC) of process-based models 4.Bayesian Model Comparison (BMC) 5.BC & BMC in NitroEurope 6.Examples of BC & BMC in other sciences 7.BC & BMC as tools to develop theory 8.References, Summary, Discussion

3 1. Introduction: Process-based modelling of forests and uncertainties

4 1.1 Forest growth in Europe 22 sites Empirical methods + process-based modelling Modelling groups in UK, Sweden and Finland (2), coordinated by CEH-Edinburgh Forests across Europe have started to grow faster in the 20 th century: Causes? Future trend? Previous observations RECOGNITION Project RECOGNITION (FAIRCT98-4124): 15 partner countries across Europe

5 1.2 Forest growth in Europe NPP before growth rate increase (1920) CONCLUSION 20 th century Growth accelerated by N-deposition. N-deposition CO 2 Temperature Environmental change 2000-2080: Effects on NPP HOG PFZ HEL KAR PUS RAJ PFF SOL BRI LOP TRI GA2 GA1 ALT AAL SKO BLAJAD PUN KAN KEM KOL % Change in NPP -10 -5 0 5 10 15 20 25 CO 2 Climate N-deposition CUMULATIVE EFFECTS Latitude EFM Environmental change 2000-2080: Effects on NPP HOG PFZ HEL KAR PUS RAJ PFF SOL BRI LOP TRI GA2 GA1 ALT AAL SKO BLAJAD PUN KAN KEM KOL % Change in NPP -10 -5 0 5 10 15 20 25 Latitude CONCLUSION 21 st century: Growth likely to be accelerated by climate change and increasing [CO 2 ].

6 1.3 Reality check ! How reliable is the European forest study: Sufficient data for model parameterization? Sufficient data for model input? Would another model have given different results? In every study using systems analysis and simulation: Model parameters, inputs and structure are uncertain How to deal with uncertainties optimally?

7 1.4 Forest models and uncertainty Model [Levy et al, 2004]

8 1.4 Forest models and uncertainty bgc century hybrid N dep UE (kg C kg -1 N) [Levy et al, 2004]

9 1.5 Model-data fusion Uncertainties are everywhere: Models (environmental inputs, parameters, structure), Data Uncertainties can be expressed as probability distributions (pdfs) We need methods that: Quantify all uncertainties Show how to reduce them Efficiently transfer information: data models model application Calculating with uncertainties (pdfs) = Probability Theory

10

11 2. Bayes Theorem

12 2.1 Dealing with uncertainty: Medical diagnostics A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Test result is positive (bad news!) What is P(diseased|test positive)? (a)0.50 (b)0.98 (c)0.99 P(dis) = 0.01 P(pos|hlth) = 0.01 P(pos|dis) = 0.99 P(dis|pos) = P(pos|dis) P(dis) / P(pos) Bayes Theorem

13 2.1 Dealing with uncertainty: Medical diagnostics A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Test result is positive (bad news!) What is P(diseased|test positive)? (a)0.50 (b)0.98 (c)0.99 P(dis) = 0.01 P(pos|hlth) = 0.01 P(pos|dis) = 0.99 P(dis|pos) = P(pos|dis) P(dis) / P(pos) = P(pos|dis) P(dis) P(pos|dis) P(dis) + P(pos|hlth) P(hlth) Bayes Theorem

14 2.1 Dealing with uncertainty: Medical diagnostics A flu epidemic occurs: one percent of people is ill Diagnostic test, 99% reliable Test result is positive (bad news!) What is P(diseased|test positive)? (a)0.50 (b)0.98 (c)0.99 P(dis) = 0.01 P(pos|hlth) = 0.01 P(pos|dis) = 0.99 P(dis|pos) = P(pos|dis) P(dis) / P(pos) = P(pos|dis) P(dis) P(pos|dis) P(dis) + P(pos|hlth) P(hlth) = 0.99 0.01 0.99 0.01 + 0.01 0.99 = 0.50 Bayes Theorem

15 2.2 Bayesian updating of probabilities Model parameterization:P(params) P(params|data) Model selection:P(models) P(model|data) SPAM-killer:P(SPAM) P(SPAM|E-mail header) Weather forecasting:… Climate change prediction:… Oil field discovery:… GHG-emission estimation:… Jurisprudence:… Bayes Theorem:Prior probability Posterior prob. Medical diagnostics:P(disease) P(disease|test result)

16 2.2 Bayesian updating of probabilities Model parameterization:P(params) P(params|data) Model selection:P(models) P(model|data) Bayes Theorem:Prior probability Posterior prob. Application of Bayes Theorem to process-based models (not analytically solvable): Markov Chain Monte-Carlo (Metropolis algorithm)

17

18 2.3 What and why? We want to use data and models to explain and predict ecosystem behaviour Data as well as model inputs, parameters and outputs are uncertain No prediction is complete without quantifying the uncertainty. No explanation is complete without analysing the uncertainty Uncertainties can be expressed as probability density functions (pdfs) Probability theory tells us how to work with pdfs: Bayes Theorem (BT) tells us how a pdf changes when new information arrives BT: Prior pdf Posterior pdf BT: Posterior = Prior x Likelihood / Evidence BT: P(θ|D) = P(θ) P(D|θ) / P(D) BT: P(θ|D) P(θ) P(D|θ)

19 3. Bayesian Calibration (BC) of process-based models

20 3.1 Process-based forest models Soil C NPP Height Environmental scenarios Initial values Parameters Model

21 3.2 Process-based forest model BASFOR BASFOR 40+ parameters 12+ output variables

22 3.3 BASFOR: outputs Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

23 3.4 BASFOR: parameter uncertainty

24 3.5 BASFOR: prior output uncertainty Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

25 3.6 Data Dodd Wood (R. Matthews, Forest Research) Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

26 3.7 Using data in Bayesian calibration of BASFOR Prior pdf Posterior pdf Data Bayesian calibration

27 3.8 Bayesian calibration: posterior uncertainty Volume (standing) Carbon in trees (standing + thinned) Carbon in soil

28 3.9 How does BC work again? P( |D) = P( ) P(D| ) / P(D) P( ) P(D|f( )) Posterior distribution of parameters Prior distribution of parameters Likelihood of data, given mismatch with model output f = the model, e.g. BASFOR

29 Bayesian calibration in action! OutputParameter prob. distr. Bayes Theorem: P( |D) P( ) P(D|(f( )) Data

30 3.10 Calculating the posterior using MCMC Sample of 10 4 -10 5 parameter vectors from the posterior distribution P( |D) for the parameters P( |D) P( ) P(D|f( )) 1.Start anywhere in parameter-space: p 1..39 (i=0) 2.Randomly choose p(i+1) = p(i) + δ 3.IF:[ P(p(i+1)) P(D|f(p(i+1))) ] / [ P(p(i)) P(D|f(p(i))) ] > Random[0,1] THEN: accept p(i+1) ELSE: reject p(i+1) i=i+1 4.IF i < 10 4 GOTO 2 Metropolis et al (1953) MCMC trace plots

31 BC3D.AVI 3.11 MCMC in action

32 3.12 Using data in Bayesian calibration of BASFOR Prior pdf Data Bayesian calibration Posterior pdf

33 3.13 Parameter correlations 39 parameters

34 3.14 Continued calibration when new data become available Prior pdf Posterior pdf Bayesian calibration Prior pdf New data

35 3.14 Continued calibration when new data become available New data Bayesian calibration Prior pdf Posterior pdf Prior pdf

36 3.15 Bayesian projects at CEH-Edinburgh Selection of forest models Data Assimilation forest EC data (David Cameron, Mat Williams, M.v.Oijen) Risk of frost damage in grassland Uncertainty in UK C- sequestration (Marcel van Oijen, Jonathan Rougier, Ron Smith, Tommy Brown, Amanda Thomson) Uncertainty in earth system resilience (Clare Britton & David Cameron) Parameterization and uncertainty quantification of 3-PG model of forest growth & C-stock (Genevieve Patenaude, Ronnie Milne, M. v.Oijen) [CO 2 ] Time

37 3.16 BASFOR: forest C-sequestration 2005-2076 UKCIP Change in annual mean Temperature Change in potential C-seq. Uncertainty in change of potential C-seq. - Uncertainty due to model parameters only, NOT uncertainty in inputs / upscaling

38 3.17 Integrating RS-data (Patenaude et al.) Model 3-PG BC RS-data: Hyper- spectral, LiDAR, SAR

39 3.18 What kind of measurements would have reduced uncertainty the most ?

40 3.19 Prior predictive uncertainty & height-data Height Biomass Prior pred. uncertainty Height data Skogaby

41 3.20 Prior & posterior uncertainty: use of height data Height Biomass Prior pred. uncertainty Posterior uncertainty (using height data) Height data Skogaby

42 3.20 Prior & posterior uncertainty: use of height data Height Biomass Prior pred. uncertainty Posterior uncertainty (using height data) Height data (hypothet.)

43 3.20 Prior & posterior uncertainty: use of height data Height Biomass Prior pred. uncertainty Posterior uncertainty (using height data) Posterior uncertainty (using precision height data)

44 3.21 Summary for BC procedure Data D ± σModel f Prior P( ) Calibrated parameters, with covariances Uncertainty of model output Sensitivity analysis of model parameters Error function e.g. N(0, σ) MCMC Samples of (10 4 – 10 5 ) Samples of f( ) (10 4 – 10 5 ) P(D|f( )) Posterior P( |D) PCC

45 3.22 Summary for BC vs tuning Model tuning 1.Define parameter ranges (permitted values) 2.Select parameter values that give model output closest (r 2, RMSE, …) to data 3.Do the model study with the tuned parameters (i.e. no model output uncertainty) Bayesian calibration 1. Define parameter pdfs 2. Define data pdfs (probable measurement errors) 3. Use Bayes Theorem to calculate posterior parameter pdf 4. Do all future model runs with samples from the parameter pdf (i.e. quantify uncertainty of model results) BC can use data to reduce parameter uncertainty for any process-based model

46 4. Bayesian Model Comparison (BMC)

47 4.1 RECOGNITION revisited: model uncertainty 0 5 10 15 20 25 Latitude EFM

48 4.1 RECOGNITION revisited: model uncertainty HOG PFZ HEL KAR PUS RAJ PFF SOL BRI LOP TRI GA2 GA1 ALT AAL SKO BLAJAD PUN KAN KEM KOL -5 0 5 10 15 20 25 Latitude HOG PFZ HEL KAR PUS RAJ PFF SOL BRI LOP TRI GA2 GA1 ALT AAL SKO BLAJAD PUN KAN KEM KOL 0 10 20 30 40 Latitude -10 -5 0 5 10 15 20 HOG PFZ HEL KARPUS RAJ PFF SOL BRI LOP TRI GA2GA1 ALT AAL SKO BLAJAD PUN KAN KEM KOL -10 0 10 20 Latitude EFM EFIMOD FinnFor Q

49 4.2 Bayesian comparison of two models Bayes Theorem for model probab.: P(M|D) = P(M) P(D|M) / P(D) The Integrated likelihood P(D|M i ) can be approximated from the MCMC sample of outputs for model M i ( * ) Model 1 Model 2 P(M 2 |D) / P(M 1 |D) = P(D|M 2 ) / P(D|M 1 ) The Bayes Factor P(D|M 2 ) / P(D|M 1 ) quantifies how the data D change the odds of M 2 over M 1 P(M 1 ) = P(M 2 ) = ½ (*)(*) harmonic mean of likelihoods in MCMC-sample (Kass & Raftery, 1995)

50 4.3 BMC: Tuomi et al. 2007

51 4.4 Bayes Factor for two big forest models MCMC 5000 steps Calculation of P(D|BASFOR) Calculation of P(D|BASFOR+) Data Rajec: Emil Klimo

52 4.5 Bayes Factor for two big forest models MCMC 5000 steps Calculation of P(D|BASFOR) Calculation of P(D|BASFOR+) Data Rajec: Emil Klimo P(D|M 1 ) = 7.2e-016 P(D|M 2 ) = 5.8e-15 Bayes Factor = 7.8, so BASFOR+ supported by the data

53 4.6 Summary of BMC procedure Data D Prior P( 1 ) Updated parameters MCMC Samples of 1 (10 4 – 10 5 ) Posterior P( 1 |D) Model 1 MCMC Prior P( 2 ) Model 2 Samples of 2 (10 4 – 10 5 ) Posterior P( 2 |D) Updated parameters P(D|M 1 )P(D|M 2 ) Bayes factor Updated model odds

54 5. B 5. BC & BMC in NitroEurope

55 What is the effect of reactive nitrogen supply on the direction and magnitude of net greenhouse gas budgets for Europe? This CEH co-ordinated IP builds on CEHs involvement in other previous and current European GHG projects such as GREENGRASS, CarboMont and CarboEurope IP 5.1 NitroEurope & uncertainty

56 5.2 NitroEurope & Uncertainty Modellers NEU (2006) NitroEurope (NEU): non-CO 2 GHG Europe experiments at plot- scale, observations at regional scale models at plot- and regional scale protocols for good- modelling practice and for uncertainty quantification and analysis (collab. with CEU in JUTF)

57 5.3 Uncertainty assessment NEU models Plot scale forest model added in 2007: DAYCENTBC Yes BC

58 5.4 NEU – Forest model comparison 2007-8 4 models (DNDC, BASFOR, COUP, DayCENT) Models frozen 30-11-2007 Calibration of models using data Höglwald (D) {Mainly N 2 O & NO-emission rates} Comparison of models using data AU & DK Bayesian Calibration (BC) Bayesian Model Comparison (BMC)

59 Bayesian Calibration (BC) and Bayesian Model Comparison (BMC) of process- based models in NitroEurope: Theory, implementation and guidelines

60 Theory of BC and BMC Methods for doing BC: MCMC and Accept-Reject 3.1 Standard Metropolis algorithm 3.2 Metropolis with a modified proposal generating mechanism (Reflection method) 3.3 Accept-Reject algorithm FAQ – Bayesian Calibration References Appendix 1: MCMC code in MATLAB: the Metropolis algorithm Appendix 2: MCMC code in MATLAB: Metropolis-with-Reflection Appendix 3: ACCEPT-REJECT code in MATLAB Appendix 4: MCMC code in R: the Metropolis algorithm

61 5.6 BASFOR changes for NEU 1.Soil temperature calculated 2.Mineralisation of litter and SOM = f(T soil ): Gaussian curve (Tuomi et al. 2007): f = exp[ (T-10) (2Tm-T-10) / 2σ 2 ] 3.Nemission split up into N 2 O and NO: Hole-In-the-Pipe (HIP) approach (Davidson & Verchot, 2000): fN 2 O = 1 / ( 1 + exp[-r(WFPS-WFPS 50 )] ) Water-Filled Pore Space (WFPS) (-) fN 2 O (-)

62 5.12 BC results: Prior & Posterior

63 5.13 BC results: simulation uncertainty & data

64 5.15 Data have information content, which is additive = +

65 5.16 BMC BASFOR BASFOR with T-sensitivity Data 1983- 1997 Data 1998- 2003 BF = 1131.0 log P(D) = -614.4 log P(D) = -427.6 log P(D) = -607.4 log P(D) = -428.7 BF = 0.33

66 6. Examples of BC & BMC in other sciences

67 Linear regression using least squares Model: straight line Prior: uniform Likelihood: Gaussian (iid) BC, e.g. for spatiotemporal stochastic modelling with spatial correlations included in the prior = Note: Realising that LS-regression is a special case of BC opens up possibilities to improve on it, e.g. by having more information in the prior or likelihood (Sivia 2005) All Maximum Likelihood estimation methods can be seen as limited forms of BC where the prior is ignored (uniform) and only the maximum value of the likelihood is identified (ignoring uncertainty) Hierarchical modelling = BC, except that uncertainty is ignored 6.1 Bayes in other disguises

68 - Inverse modelling (e.g. to estimate emission rates from concentrations) - Geostatistics, e.g. Bayesian kriging - Data Assimilation (KF, EnKF etc.) 6.2 Bayes in other disguises (cont.)

69 6.3 Regional application of plot-scale models Upscaling methodModel structureModelling uncertainty 1. Stratify into homogeneous subregions & Apply UnchangedP(θ) unchanged Upscaling unc. 2. Apply to selected points (plots) & Interpolate Unchanged (but extend w. geostatistical model) P(θ) unchanged (Bayesian kriging only), Interpolation uncertainty 3. Reinterpret the model as a regional one & Apply UnchangedNew BC using regional I-O data 4. Summarise model behav. & Apply exhaustively (deterministic metamodel) E.g. multivariate regression model or simple mechanistic New BC needed of metamodel using plot-data 5. As 4. (stochastic emulator)E.g. Gaussian process emulator Code uncertainty (Kennedy & OH.) 6. Summarise model behaviour & Embed in regional model Unrelated new model New BC using regional I-O data

70 7. References, Summary, Discussion

71 7.1 Bayesian methods: References Bayes, T. (1763) Metropolis, N. (1953) Kass & Raftery (1995) Green, E.J. / MacFarlane, D.W. / Valentine, H.T., Strawderman, W.E. (1996, 1998, 1999, 2000) Jansen, M. (1997) Jaynes, E.T. (2003) Van Oijen et al. (2005) Bayes Theorem MCMC BMC Forest models Crop models Probability theory Complex process- based models, MCMC

72 7.2 Discussion statements / Conclusions Uncertainty (= incomplete information) is described by pdfs 1.Plausible reasoning implies probability theory (PT) (Cox, Jaynes) 2.Main tool from PT for updating pdfs: Bayes Theorem 3.Parameter estimation = quantifying joint parameter pdf 4.Model evaluation = quantifying pdf in model space requires at least two models

73 7.2 Discussion statements / Conclusions Uncertainty (= incomplete information) is described by pdfs 1.Plausible reasoning implies probability theory (PT) (Cox, Jaynes) 2.Main tool from PT for updating pdfs: Bayes Theorem 3.Parameter estimation = quantifying joint parameter pdf BC 4.Model evaluation = quantifying pdf in model space requires at least two models BMC Practicalities: 1.When new data arrive: MCMC provides a universal method for calculating posterior pdfs 2.Quantifying the prior: Not a key issue in env. sci.: (1) many data, (2) prior is posterior from previous calibration MaxEnt can be used (Jaynes) 3.Defining the likelihood: Normal pdf for measurement error usually describes our prior state of knowledge adequately (Jaynes) 4.Bayes Factor shows how new data change the odds of models, and is a by-product from Bayesian calibration (Kass & Raftery) Overall: Uncertainty quantification often shows that our models are not very reliable

74

75 App2.1 How to do BC The problem: You have: (1) a prior pdf P(θ) for your models parameters, (2) new data. You also know how to calculate the likelihood P(D|θ). How do you now go about using BT to calculate the posterior P(θ|D)? Methods of using BT to calculate P(θ|D): 1.Analytical. Only works when the prior and likelihood are conjugate (family-related). For example if prior and likelihood are normal pdfs, then the posterior is normal too. 2.Numerical. Uses sampling. Three main methods: 1.MCMC (e.g. Metropolis, Gibbs) Sample directly from the posterior. Best for high-dimensional problems 2.Accept-Reject Sample from the prior, then reject some using the likelihood. Best for low-dimensional problems 3.Model emulation followed by MCMC or A-R

76 Should we measure the sensitive parameters? Yes, because the sensitive parameters: are obviously important for prediction ? No, because model parameters: are correlated with each other, which we do not measure cannot really be measured at all So, it may be better to measure output variables, because they: are what we are interested in are better defined, in models and measurements help determine parameter correlations if used in Bayesian calibration Key question: what data are most informative?


Download ppt "Bayesian methods for calibrating and comparing process-based vegetation models Marcel van Oijen (CEH-Edinburgh)"

Similar presentations


Ads by Google