Bayesian calibration and uncertainty analysis of dynamic forest models Marcel van Oijen CEH-Edinburgh
Input to forest models and output Soil C NPP Height Environmental scenarios Initial values Parameters Model Imperfect input data
Input to forest models and output Model [Levy et al, 2004]
Input to forest models and output bgc century hybrid N dep UE (kg C kg -1 N) [Levy et al, 2004]
Simpler models? Goal: Robust models, predicting forest growth over 100 years, with low uncertainty Effects that must be accounted for: N-deposition CO 2 Temperature Rain Radiation Soil fertility Management, e.g. thinning... Are simple, robust models possible? Typical model size: parameters
Simple (semi-)empirical relationships 1.Lieth (1972, Miami-model): NPP = f(Temperature, Rain) 2.Monteith (1977): NPP = LUE * Intercepted light 3.Gifford (1980): NPP = NPP 0 (1 + β Log([CO 2 ]/[CO 2 ] 0 ) ) 4.Gifford (1994): NPP = 0.5 GPP 5.Temperature ~ Light intensity 6.Roberts & Zimmermann (1999): LAI max Rain 7.Beers Law: Fractional light interception = (1-e -k LAI ) 8.West, Brown, Enquist, Niklas ( ): Height ~ Mass ¼ ~ {f leaf, f stem, f root } 9.Brouwer (1983): Root-shoot ratio = f(N) 10.Goudriaan (1990): Turn-over rates, SOM, litter
BASic FORest model (BASFOR) BASFOR 24 output variables 39 parameters
BASFOR: Inputs BASFOR 24 output variables Weather & soil: Skogaby (Sweden)
Forest data from Skogaby (Sweden) Planted: 1966, (2300 trees ha -1 ) Weather data: Soil data: C, N, Mineralisation rate Tree data: Biomass, NPP, Height, [N], LAI Skogaby
BASFOR: Inputs BASFOR 24 output variables Weather & soil: Skogaby (Sweden)
BASFOR: Inputs BASFOR Weather & soil: Skogaby (Sweden) p 1,min p 1,max P(p 1 ) p 39,min p 39,max P(p 39 ) 24 output variables
BASFOR: Prior predictive uncertainty Wood C Height NPP Skogaby, not calibrated (m ± σ)
BASFOR: Predictive uncertainty BASFOR 24 output variables High output uncertainty 39 parameters High input uncertainty Data: measurements of output variables Calibration of parameters
CalibrationCalibration f P(f(p)) P(p) D Calibration = Find P(p|D) Bayesian calibration: P(p|D) = P(p) P(D|p) / P(D) P(p) L(f(p)|D) Posterior distribution Prior distribution Likelihood given mismatch between model output & data:
CalibrationCalibration f P(f(p)) P(p) D Bayesian calibration P(f(p)) P(p)
Data Skogaby (S) Wood C Height NPP
Calculating the posterior distribution Bayesian calibration: P(p|D) P(p) L(f(p)|D) Calculating P(p|D) costs much time: 1.Sample parameter-space representatively 2.For each sampled set of parameter-values: a.Calculate P(p) b.Run the model c.Calculate errors (model vs data), and their likelihood Sampling problem: Markov Chain Monte Carlo (MCMC) methods Computing problem: Computer power, Numerical software Solutions
Markov Chain Monte Carlo (MCMC) Metropolis algorithm BASFOR (~ 30 lines of code) MCMC: walk through parameter-space, such that the set of visited points approaches the posterior parameter distribution P(p|D) 1.Start anywhere in parameter-space: p (i=0) 2.Randomly choose p(i+1) = p(i) + δ 3.IF:[ P(p(i+1)) L(f(p(i+1))) ] / [ P(p(i)) L(f(p(i))) ] > Random[0,1] THEN: accept P(i+1) & i=i+1 ELSE: reject P(i+1) 4.IF i < 10 4 GOTO 2 1.E.g. {SLA=5, k=0.4,......} 2.Use multivariate normal distribution for [δ 1,...,δ 39 ] 3.Run BASFOR. Assume normally distributed errors: L(output- data j ) ~ N(0,σ j ) with different σ j for each datapoint
MCMC parameter trace plots: steps Steps in MCMC Param. value
Marginal distributions of parameters
Parameter correlations (PCC) 39 parameters
Posterior predictive uncertainty Wood C Height NPP Skogaby, calibrated (m ± σ)
Posterior predictive uncertainty vs prior Wood C Height NPP Skogaby, calibrated (m ± σ) Skogaby, not calibrated (m ± σ)
Partial correlations parameters – output variables 24 output variables 39 parameters Wood C
Wood C vs parameter-values
Partial correlations parameters – wood C p x Allocation to wood Senescence stem+br. SOM turnover Max N leaf N root
Should we measure the sensitive parameters? Yes, because the sensitive parameters: are obviously important for prediction No, because model parameters: are model-specific are correlated with each other, which we do not measure cannot really be measured at all So, it may be better to measure output variables, because they: are what we are interested in are better defined, in models and measurements help determine parameter correlations if used in Bayesian calibration
The value of NPP-data Wood C Height NPP Skogaby, calibrated on NPP- data only (m ± σ) Skogaby, not calibrated (m ± σ)
Data of height growth: poor quality Wood C Height NPP Skogaby, calibrated on poor height-data only (m ± σ) Skogaby, not calibrated (m ± σ)
Data of height growth: high quality Wood C Height NPP Skogaby, calibrated on poor height-data only (m ± σ) Skogaby, not calibrated (m ± σ) Skogaby, calibrated on good height-data only (m ± σ)
Model application to forest growth in Rajec (Czechia) Rajec (CZ): Planted: 1903, (6000 trees ha -1 ) Tree data: Wood-C, Height Skogaby Rajec
Rajec (CZ): Uncalibrated and calibrated on Skogaby (S) Wood C Height NPP Rajec, Skogaby-calibrated (m ± σ) Rajec, not calibrated (m ± σ)
Rajec (CZ): Uncalibrated and calibrated on Skogaby (S) Wood C Height NPP Rajec, Skogaby-calibrated (m ± σ) Rajec, not calibrated (m ± σ)
Rajec (CZ): further calibration on Rajec-data Wood C Height NPP Rajec, Skogaby-calibrated (m ± σ) Rajec, not calibrated (m ± σ) Rajec, Skogaby- and Rajec- calibrated (m ± σ)
Summary of procedure Data D ± σModel fPrior P(p) Calibrated parameters, with covariances Uncertainty analysis of model output Sensitivity analysis of model parameters Error function e.g. N(0, σ) MCMC Samples of p (10 4 – 10 5 ) Samples of f(p) (10 4 – 10 5 ) Posterior P(p|D) P(f(p)|D) PCC
Model selection Soil C NPP Height Environmental scenarios Initial values Parameters Model Imperfect understanding Imperfect output data Imperfect input data
Model selection Bayesian model selection: P(M|D) P(M) L(f M (p M )|D) Bayesian calibration: P(p|D) P(p) L(f(p)|D) BASFOR (39 parameters) Expolinear (4 parameters) Max(log(L)) = -5.7Max(log(L)) = -6.9 By-products of MCMC Mean(log(L)) = -6.4 Mean(log(L)) = -8.7
Conclusions (1) Reducing parameter uncertainty: Reduces predictive uncertainty Reveals magnitude of errors in model structure Benefits little from parameter measurement: i.model parameter what you measure ii.parameter covariances are more important than variances Requires calibration on measured outputs (eddy fluxes, C- inventories, height-measurement,...) Calibration: Requires precise data Central output variables are more useful than peripheral (NPP/gas exchange > height)
Conclusions (2) MCMC-calibration Works on all models Conceptually simple, grounded in probability theory Algorithmically simple (Metropolis) Not fast ( model runs) Produces: 1.Sample from parameter pdf (means, variances and covariances), with likelihoods 2.Corresponding sample of model outputs (UA) 3.Partial correlation analysis of outputs vs parameters (SA) Model selection Can use the same probabilistic approach as calibration Can use mean model log-likelihoods produced by MCMC
AcknowledgementsAcknowledgements Göran Ågren (S) & Emil Klimo (CZ) Peter Levy, Renate Wendler, Peter Millard (UK) Ron Smith (UK)
Appendix 1: Calculation times per MCMC-step
MCMC: to do 1.Burn-in 2.Multiple chains 3.Mixing criteria (from characteristics of individual chains and from comparison of multiple chains) 4.Better (dynamic? f(prior?)) choice of step-length for generating candidate next points in p-space 5.Other speeding-up tricks?