Presentation on theme: "Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress) James L. Crooks (SAMSI, Duke University)"— Presentation transcript:
Emulation of a Stochastic Forest Simulator Using Kernel Stick-Breaking Processes (Work in Progress) James L. Crooks (SAMSI, Duke University)
Background We desire to predict the distribution of tree species in the North Carolina forest under a variety of future climate change scenarios. Toward this end we can use the forest simulator developed by J. Clark and P. Agarwals joint research group. This simulator models the life-cycle of individual trees within a tree stand of pre-specified area. Growth and fecundity are in part mediated by the climate-influenced variables temperature and soil moisture.
Motivation The forest simulator has the following properties that make emulation both important and difficult: – Its speed limits the physical area that can be simulated in reasonable time (the current standard is 128 m x 128 m) – Its output is stochastic – Its output distribution can be non-gaussian – Its output distribution can vary over the input space. Thus there is a need for a local, nonparametric statistical method to emulate the entire output distribution across in the input space.
Objectives Run simulator with 3 species under standard climatic conditions for 1000+ years to establish equilibrium initial conditions. Run simulator for a further 100 years at each of various points in the climate input space (temperature and soil moisture increase rates). Emulate the output over this input space using the Kernel Stick-Breaking Processes idea of Dunson and Park (2006).
i indexes the run of the simulator x i1 = Mean Temperature Increase / Century x i2 = Mean Soil Moisture Increase / Century y i1 = Final Number of Adult Trees of Species 1 y i2 = Final Number of Adult Trees of Species 2 y i3 = Final Number of Adult Trees of Species 3 Simulator Climate Input Variables Design Matrix (see below) Simulator Output Variables Summary of Input and Output Variables
Forest Simulator output for the 1001 year initialization run. We will focus on number of adult trees. Legend Total Species 1 Species 2 Species 3
We expect that the mean response will be suppressed at extreme values of climate variables. Therefore we model the mean response as: with a design matrix having up to quadratic terms Climate Variable (Temperature or Soil Moisture Increase Rate) Number of Trees i indexes simulator run j indexes the tree species k indexes the regression coef. Single Regression Surface Justifying the Choice of Model
We do not a priori expect the output distribution to be Gaussian anywhere on the input space. Use a non-parametric (Dirichlet Process) infinite mixture of regression surfaces instead of a single surface. We do not a priori expect the shape of the output distribution to be constant over the input space. Use the Kernel Stick-Breaking Process of Dunson and Park (2006) to allow the DP mixture to be predictor-dependent. Climate Variable (Temperature or Soil Moisture Increase Rate) Number of Trees Finite (Truncated) Mixture of Regression Surfaces
Negative Binomial Likelihood The output variable of interest is number of adult trees of each species. Why not use a Poisson likelihood? Preliminary data show Var[y] scales roughly like E[y] 2, not E[y], and Var[y] is also inversely dependent on the forest area. Use the negative binomial distribution, which has pmf: and moments: where the prior range of can be increased with area.
Comments on the Model This model, unlike Dunson and Parks original, lacks conjugacy between f and G 0 ; thus two changes must be made to their algorithm: – We no longer have the full conditional for, so we must use a Metropolis-Hastings step to update it. – The integral cannot be evaluated exactly so we must approximate it numerically using (e.g., ) Monte- Carlo integration. The original MATLAB code is itself not fast, but once a posterior sample has been generated it is cheap to predict the output pmf at new points in the input space.
Generating Simple Climate Change Scenarios The ballpark estimates of todays (soil moisture, temperature) mean and covariance are: The 1000+ year initialization run has temperature and soil moisture generated by a MVN with this mean and covariance. Temperature is measured in °C and soil moisture in %.
Future 100 year scenarios are generated assuming the means change linearly in time with rates given by the points on plot below: GCMs generally predict hotter, drier conditions for the Southeastern US. Accordingly, ranges were: [-1,+2]*SD/century for Temperature and [-2,+1]*SD/century for Soil Moisture.
Shown are the generated soil moisture and temperature used in the initialization run, and three generated future scenarios. Climate change begins at year 1052. Legend Stable Climate Hotter/Drier Cooler/Wetter
Results I just got the initialization run back last week, so ask me in 3 months. Other Thoughts May need to continue the initialization run another 500-1000 years to get a better equilibrium. Need a lot more runs when using nonparametrics anyway, so the benefits of using a Latin Hyper- Cube design are less obvious (in 2-D anyway).
Acknowledgements Jim Clarks group for use of their simulator, and especially Sean McMahon for his invaluable assistance. David Dunson and Ju-Hyun Park for explaining their paper to me and letting me use their algorithm. The SAMSI Methodology and Terrestrial Models Working Groups for fruitful discussions. References Dunson, D. B., and J.-H. Park, Kernel Stick-Breaking Processess, ISDS Discussion Paper 22 (2006) and Biometrika (accepted) Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, A scalable simulator for forest dynamics, Symposium on Computational Geometry 2004: 106-115 Govindarajan, S., M. Dietze, P. Agarwal, and J. S. Clark, A scalable algorithm for dispersing populations, Journal of Intelligent Information Systems 2004 (online)