Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-model integration: Examples from belowground ecosystem ecology

Similar presentations


Presentation on theme: "Data-model integration: Examples from belowground ecosystem ecology"— Presentation transcript:

1 Data-model integration: Examples from belowground ecosystem ecology
Kiona Ogle University of Wyoming Departments of Botany & Statistics

2 Today’s Task What are some ecological questions to which sensor network data could be applied? How would those data be used in models? Overview modeling of ecological data and processes.

3

4 Types of Questions What are some ecological questions to which sensor network data could be applied? Spatial & temporal processes Improved ecological understanding More accurate prediction & forecasting Example problems “Biogeochemical exchanges between the atmosphere & biosphere” How do environmental perturbations affect carbon & water exchange? Partitioning ecosystem processes & components Linking processes & mechanisms operating at multiple temporal & spatial scales

5 How to Address Such Questions?
Couple data and models Sensor network data Very rich Real-time; large datasets; spatially extensive and/or temporally intensive Heterogeneous Different locations, processes, and conditions Models & data analysis Less appropriate: “Classical” analyses that assume linearity and normality of data Design-based inference about patterns More appropriate: Coupling of process-based models with diverse and rich datasets Model-based inference about patterns and mechanisms

6 Why Couple Data & Process Models?
Parameter estimation (or “model parameterization”) Quantification of uncertainty Improved predictions and forecasts Decision support, management, conservation Synthesize multiple types of data Relate different system components to each other Learn about important mechanisms Hypothesis generation Use data-informed models to generate testable hypotheses Inform sampling and network design Data analysis Go beyond simple “classical” analyses Explicit integration of multiple data types, diverse scales, and nonlinear and non-Gaussian processes

7 How to Couple Data & Process Models?
Multiple approaches, for example: Maximum likelihood-based models Least squares, minimization of objective functions Hierarchical Bayesian models Hierarchical Bayesian approach Recall, from Jennifer’s talk … Unknown quantities Observed data Data parameters Process parameters Latent (or true) process Posterior Likelihood Probabilistic process model Prior(s)

8 Outline The process model:
Types of ecological models Building process models Examples from belowground ecosystem ecology: Motivating issues Ex 1: Estimating components of soil organic matter decomposition Ex 2: Deconvolution of soil respiration (i.e., CO2 efflux) In both examples, highlight: Data sources Process models Data-model integration Implications of data-model integration for sensor network data & applications

9 Hierarchical Bayesian Model
Observed data Latent (or true) process Data parameters Process parameters Unknown quantities Posterior Likelihood Probabilistic process model Prior(s) Data model (likelihood) Probabilistic process model The “process model”

10 The Process Model Conceptual model: Model formulation:
Systems diagrams Graphical models Model formulation: Explicit, mathematical eqn’s Systems equations State-space equations Inputs Outputs “Compare” Unobserved quantities (parameters) Conceptual model Mathematical model Observed quantities (data) Analytical output Observed quantities (driving variables) Numerical/ simulation output “Predict” Simulation model Unobserved or latent quantities The “process model”

11 Types of Process Models
Deterministic Stochastic Compartment models (differential or difference equn’s) Matrix models Reductionist models (include lots of details & components) Holistic models (use general principles) Static models Dynamic models Distributed models (system depends on space & time) Lumped models Linear models Nonlinear models Causal/mechanistic models Black box models Analytical models Numerical/simulation models Jorgensen (1986) Fundamentals of Ecological Modelling. 389 pp. Elsevier, Amsterdam.

12 Soil Carbon Cycle Model
Upcoming Example: Soil Carbon Cycle Model Deterministic Stochastic Compartment models (differential or difference equn’s) Matrix models Reductionist models (include lots of details & components) Holistic models (use general principles) Static models Dynamic models Distributed models (system depends on space & time) Lumped models Linear models Nonlinear models Causal/mechanistic models Black box models Analytical models Numerical/simulation models

13 Example Process Model Simplified systems diagram of the soil
Pools or state variables Simplified systems diagram of the soil carbon cycle in a temperate forest Flows of carbon Source: Xu et al. (2006) Global Biogeochemical Cycles Vol. 20 GB2007.

14 Model Formulation A: matrix of flux rates or “carbon transfer coefficients” (parameters) u(t): flux of carbon into the system (e.g., photosynthetic flux) (driving variable or modeled quantity) B: vector of ‘allocation fractions’ (parameters) X: vector of state variables (unobservable latent quantities, outputs) Source: Xu et al. (2006) Global Biogeochemical Cycles Vol. 20 GB2007.

15 Model Formulation Observable (data)
Source: Xu et al. (2006) Global Biogeochemical Cycles Vol. 20 GB2007.

16 How to Couple Data & Process Models?
Hierarchical Bayesian approach Unknown quantities Observed data Data parameters Process parameters Latent (or true) process Posterior Likelihood Probabilistic process model Prior(s) Data model (likelihood) Probabilistic process model

17 Outline The process model:
Types of ecological models Building process models Examples from belowground ecosystem ecology: Motivating issues Ex 1: Estimating components of soil organic matter decomposition Ex 2: Deconvolution of soil respiration (i.e., CO2 efflux) In both examples, highlight: Data sources Process models Data-model integration Implications of data-model integration for sensor network data & applications

18 Ecosystem Processes Emphasis on aboveground What about belowground?

19 Biogeochemical Cycles
N H20 N H20 C C P

20 Biogeochemical Cycles
N H20 C P Belowground system is critical Tightly linked to aboveground system

21 Belowground “Issues” Aboveground Belowground Outstanding issues
Lots of info Easy to measure Belowground Little info Difficult to measure Aboveground measurements (helpful but limited) As plant ecologists, we know quite a bit about aboveground responses of plants and ecosystems to environmental effects, which can be attributed to our ability to easily measure aboveground responses. For example, we can measure leaf-level photosynthesis and transpiration, we can monitor the phenology of leaf production and fruiting, we can quantify shifts in species composition, we can install sapflow gauges on plant stems and continuously measure whole-plant transpiration. But, much less is known about belowground processes, yet they are critical to understanding ecosystem-level behavior. For example, soils, soil microfauna, and plant roots are critical players in ecosystem carbon and water fluxes. Now, we can use aboveground measurements to help infer belowground processes to some extent. For example, we can use soil respiration chambers to measure soil CO2 efflux, we can install eddy flux towers to measure net ecosystem carbon and water extreme at relatively large scales. Currenlty these data only provide limited insight into belowground dynamics, but they have the potential to provide much greater insight if complimented by other data sources and models. In summary, two challenges facing plant and ecosystem ecologists are: Partitioning the different components contributing to ecosystem carbon and water fluxes And, indentify key belowground processes and their effects on ecosystem fluxes. Outstanding issues Partitioning above- & belowground Quantifying & partitioning belowground Implications for ecosystem function Examples: arid & semiarid systems Figure from Kieft et al. (1998) Ecology 79:

22 Motivating Questions: Soil Carbon Cycle
From where in the soil is CO2 coming from? What are the relative contributions of autotrophs vs. heterotrophs? What factors control decomposition rates & heterotrophic activity? How does pulse precipitation affect sources of respired CO2? Implications of climate change for desert soil carbon cycling? Some questions related to ecosystem carbon dynamics, and specifically, soil respiration, include: I will address these questions in this talk, but I want to note that this project emerged out of conversations with Jessica Cable and Travis Huxman and use data from Jessie’s disseration work.

23 Integrative Approach Diverse data sources Process-based models
Experimental & observational Lab & field studies Multiple scales Varying “amounts” & “completeness” Process-based models Key mechanisms, processes, components Balance detail & simplicity Multiple scales & interactions Statistical models: data-model integration Hierarchical Bayesian framework Mark chain Monte Carlo

24 Examples Presented Today
Deterministic Stochastic Compartment models (differential or difference equn’s) Matrix models Reductionist models (include lots of details & components) Holistic models (use general principles) Static models Dynamic models (implicit dependence on time) Distributed models (implicit dependence on space & time) Lumped models Linear models Nonlinear models Causal/mechanistic models Black box models Analytical models Numerical/simulation models

25 Ex 1: Soil organic matter decomposition
Objectives: Identify soil & microbial processes affecting decomposition Learn how vegetation (i.e., microsite) controls these processes

26 Experimental Design Mesquite shrubland in southern Arizona Bare ground
Microsite types: bare ground grass small mesquite big mesquite Bare ground Grass Small mesquite Big mesquite 3 cores (reps)

27 Experimental Design ... ... CO2 Add water Add sugar + water
Incubate at 25 oC CO2 Measure CO2 efflux (soil respiration rate) at 24 & 48 hours CO2 8 depths (layers) CO2

28 Experimental Design ... ... CO2 Add water Add sugar + water Measure:
Microbial biomass Soil organic carbon Soil nitrogen Incubate at 25 oC CO2 Measure CO2 efflux (soil respiration rate) at 24 & 48 hours CO2 8 depths (layers) CO2

29 Design & Data Overview Full-factorial design: Microsite Soil layer
4 levels: bare, grass, small mesq, big mesq Soil layer 8 levels: 0-2, 2-5, ..., cm Substrate addition type 2 levels: water only, sugar + water Incubation time 2 levels: 24, 48 hrs Soil core or rep 3 cores per microsite Stochastic data: Soil respiration rate N = 359 (25 missing) Microbial biomass N = 18 (14 missing) Soil organic carbon N = 89 (7 missing)

30 Some Data

31 ? ? Analysis Objectives Soil depth microbes soil C CO2 flux
Estimate microbial respiration (decomposition) parameters (i.e., process parameters) ? Soil depth ? data Respiration biomass & activity Microbial biomass Carbon substrate

32 Process Model: Soil Respiration
Estimate microbial respiration (decomposition) parameters (i.e., process parameters) Microbial biomass (B) Respiration (R) Saturating carbon (C) Low C Respiration Microbial biomass Carbon substrate Michaelis-Menton type model: microbial “base-line” metabolic rate microbial carbon-use efficiency Assume Ac related to “substrate quality”:

33 Data-Model Integration
Full-factorial design: Microsite Soil layer Substrate addition type Incubation time Soil core or rep B C R N Stochastic data: Soil respiration rate Microbial biomass Soil organic carbon Things to consider: Multiple data types Nonlinear model Missing data Experimental design some data some data

34 Data Model (Likelihood) Observation precision
Let LR = log(R) For microsite m, soil depth d, soil core r, substrate-addition type s, and time period t: Observed rate Mean (“truth”) (latent process) Observation precision (= 1/variance)

35 Data Model (Likelihood) Observation precision
Now, for the covariates... For microsite m, soil depth d, and soil core r: Note: the likelihoods are for both the observed and missing data Observed Mean (“truth”) (latent process) Observation precision (= 1/variance)

36 Data Model (Likelihood) Likelihood components
Data parameters Latent processes

37 Probabilistic Process Model
Latent processes Deterministic model for soil microbes & carbon contents Stochastic model for latent respiration

38 Probabilistic Process Model
Stochastic model for latent respiration Specify expected process: Michaelis-Menten (process) model

39 Probabilistic Process Model
Process components Process parameters

40 Parameter Model (Priors)
Data parameters Process parameters Conjugate, relatively non-informative priors for precision terms

41 Parameter Model (Priors)
Data parameters Process parameters Non-informative Dirichlet priors for relative distributions of microbes and carbon Multivariate version of the beta distribution (with all parameters set to 1: multidimensional uniform)

42 Parameter Model (Priors)
Data parameters Process parameters Relatively non-informative (diffuse) normal priors for the rest:

43 The Posterior

44 The Posterior No analytical solution for the joint posterior distribution No analytical solution for most of the marginal distributions Approximate the posterior: Markov chain Monte Carlo methods, implemented in WinBUGS

45 Model Implementation: WinBUGS

46 Model Goodness-of-fit

47 Example Results C* (total soil carbon, g C/m2)
B* (microbial biomass, g dw/m2) Bare Big mesq. Med. Mesq. Grass Bare Big mesq. Med. Mesq. Grass

48 Relative amount of microbial biomass
Example Results Bare ground Big mesquite Relative amount of microbial biomass Surface Deep Surface Deep Soil depth (or layer)

49 Sensitivity to Data Sources

50 Ex 2: Deconvolution of Soil Respiration
From where in the soil is CO2 coming from? What are the relative contributions of autotrophs vs. heterotrophs? What factors control decomposition rates & heterotrophic activity? How does pulse precipitation affect sources of respired CO2? Multiple data sources lots limited data data data data data data data Some questions related to ecosystem carbon dynamics, and specifically, soil respiration, include: I will address these questions in this talk, but I want to note that this project emerged out of conversations with Jessica Cable and Travis Huxman and use data from Jessie’s disseration work. data

51 The Field Sites Sonoran Desert San Pedro River Basin
Santa Rita Experimental Range Sonoran Desert

52 Source isotope signatures
Stable Isotope Tracers Respired CO2 signature CO2 12C 13C Source isotope signatures Aspects of both of these belowground processes can be infered from stable isotope measurements made on both aboveground and belowground components. For example, water is composed of hydrogen and oxygen, both of which occur in different isotopic forms. E.g., some water molecules may contain the heavy stable isotope of hydrogen (deuterium) or the heavy stable isotope of oxygen (O-18). The relative abundance of the heavy isotopes in soil water often varies with soil depth. Likewise, the distribution of plant roots vary with depth, and the plant takes-up water from various depths in the soil. The amount of water taken-up from different depths depends on the soil water availity at each depth and the distribution of roots that are actively acquiring water. Thus, water in the plant stem is derived from different soil water sources, and its isotopic composition or signature reflects the soil layers from which the roots are extracting water. Likewise, consider soil respiration and the CO2 escaping from the soil. Respired CO2 is derived from substrates such as decomposing organic matter or actively respiring roots, and these substrates or sources vary in their composition of light and heavy carbon. And, the distribution of these substrates varies with depth, and their respiration rates at different depths depend on environmental conditions such at temperature and soil water content. As for the stem isotope signature, the 13C isotope signature of the respired CO2 reflects the weighted or average signature of the different sources. Important data source: facilitates “partitioning”

53 (arid systems; total mass, heterotrophic activity)
Data Source Examples stochastic data Literature data Pool Isotopes (δ13Ci) (roots, soil, litter; Keeling plots) Soil Isotopes (δ13CTot) (automated chambers & Keeling plots) Litter (arid systems; total mass, carbon, microbes) Soil CO2 flux (automated chambers) Soil CO2 flux (manual chambers) Root mass (arid systems; total mass) Datasets: field/lab pubs Soil samples (carbon content, C:N, root mass) Microbial mass (arid systems; total mass) Soil incubations (root-free, carbon substrate, microbial mass, heterotrophic activity) Root respiration (in situ gas exchange) Soil temp & water (automated, multiple locations, many depths) covariate data Root distributions (arid systems, different functional types) Soil carbon (arid systems; total C) Root respiration (arid systems, different functional types) I am just starting to develop a hierarchical Bayesian melding scheme for merging these data sources and the individual-based growth model. First, lets revisit the basic Bayesian framework. That is, the Bayesian framework used probability models or statements for everything, including data, parameters, and all other unknown quantities. What I’m ultimately interested in is the posterior distribution for the unknown quantities, such as the growth parmeters (theta), variance terms (sigma), and model outputs or predictions (O). This posterior quantifies our uncertainty in these unknowns given our observed data. And, Bayes rule says that this posterior is proportional to the likelihood of all data TIMES the prior distribution for the unknowns. So, once I know the posterior, I can use it to make inferences about the growth parameters, how they vary amoung species, and what the implications are for tree growth and survival and forest dynamics. Now, the likelihood function is really important because it explicitly links the model, data, and parameters. First, let’s consider the FIA inventory, which gives measurements for tree i, of species j, at time t, growing in location s. We input to the model an initial diameter for this tree, a set of species and site-specific parameter values, and some covariates (X), and the model gives a slew of predictions for this particular tree, including height, trunk diameter, biomass totals for each structural compartment, leaf area, sapwood area, and bunch of other stuff. Note though that some of these outputs can be matched-up with FIA data and go into the likelihood, E.g., the observed height and diameter are assumed to come from a normal distribution means that are given by the model outpus. But, the other quantities are not associated with data, but we have some information about these quantities based on previous studies or the literature, and we can use this information to come-up with prior for these unobservable outputs. Now, what I really want are parameter estimates for each species, and I want to account for spatial random effects that might influence these estimates. So, I want to estimate the posterior distribution for theta_js, and I first assume that is can be broken-up into 2 independent pieces: one is the species effect, and one is the random spatial effect. Recall that the literature database provides species-specific parameter estimates, and this also contributes to part of the likelihood. For the spatial effects, I will begin be using a fairly simple model that allows for spatial autocorrelation and that can accommodate the huge, spatially extensive FIA dataset. I will use a conditionally autoregressive model, which assumes that the spatial effect at location s come from a normal distribution with a mean equal to a weighted average of its neighboring locations effects. But, ultimately, what I am really interested in are the theta_tila_j’s – i.e., the species-specific parameter estimates. Potential sensor network data

54 Example Data Santa Rita pulse experiment
San Pedro automated flux measurements Respiration (mmol / m2 / s) Santa Rita pulse experiment – d13C San Pedro incubation experiment

55 Hierarchical Bayesian Model: Deconvolution Approach
Integrate multiple sources of information Diverse data sources Different temporal & spatial scales Literature information Lab & field studies Detailed flux models Respiration rates by source type & soil depth Dynamic models Mechanistic isotope mixing models Multiple sources

56 (arid systems; total mass, heterotrophic activity)
Data Source Examples stochastic data Literature data Pool Isotopes (δ13Ci) (roots, soil, litter; Keeling plots) Soil Isotopes (δ13CTot) (automated chambers & Keeling plots) Litter (arid systems; total mass, carbon, microbes) Soil CO2 flux (automated chambers) Soil CO2 flux (manual chambers) Root mass (arid systems; total mass) Soil samples (carbon content, C:N, root mass) Microbial mass (arid systems; total mass) Soil incubations (root-free, carbon substrate, microbial mass, heterotrophic activity) Root respiration (in situ gas exchange) Soil temp & water (automated, multiple locations, many depths) covariate data Root distributions (arid systems, different functional types) Soil carbon (arid systems; total C) Root respiration (arid systems, different functional types) I am just starting to develop a hierarchical Bayesian melding scheme for merging these data sources and the individual-based growth model. First, lets revisit the basic Bayesian framework. That is, the Bayesian framework used probability models or statements for everything, including data, parameters, and all other unknown quantities. What I’m ultimately interested in is the posterior distribution for the unknown quantities, such as the growth parmeters (theta), variance terms (sigma), and model outputs or predictions (O). This posterior quantifies our uncertainty in these unknowns given our observed data. And, Bayes rule says that this posterior is proportional to the likelihood of all data TIMES the prior distribution for the unknowns. So, once I know the posterior, I can use it to make inferences about the growth parameters, how they vary amoung species, and what the implications are for tree growth and survival and forest dynamics. Now, the likelihood function is really important because it explicitly links the model, data, and parameters. First, let’s consider the FIA inventory, which gives measurements for tree i, of species j, at time t, growing in location s. We input to the model an initial diameter for this tree, a set of species and site-specific parameter values, and some covariates (X), and the model gives a slew of predictions for this particular tree, including height, trunk diameter, biomass totals for each structural compartment, leaf area, sapwood area, and bunch of other stuff. Note though that some of these outputs can be matched-up with FIA data and go into the likelihood, E.g., the observed height and diameter are assumed to come from a normal distribution means that are given by the model outpus. But, the other quantities are not associated with data, but we have some information about these quantities based on previous studies or the literature, and we can use this information to come-up with prior for these unobservable outputs. Now, what I really want are parameter estimates for each species, and I want to account for spatial random effects that might influence these estimates. So, I want to estimate the posterior distribution for theta_js, and I first assume that is can be broken-up into 2 independent pieces: one is the species effect, and one is the random spatial effect. Recall that the literature database provides species-specific parameter estimates, and this also contributes to part of the likelihood. For the spatial effects, I will begin be using a fairly simple model that allows for spatial autocorrelation and that can accommodate the huge, spatially extensive FIA dataset. I will use a conditionally autoregressive model, which assumes that the spatial effect at location s come from a normal distribution with a mean equal to a weighted average of its neighboring locations effects. But, ultimately, what I am really interested in are the theta_tila_j’s – i.e., the species-specific parameter estimates.

57 Bayesian Deconvolution
The Hierarchical Bayesian Model Some Likelihood Components Likelihood of data (isotopes & soil flux) I am just starting to develop a hierarchical Bayesian melding scheme for merging these data sources and the individual-based growth model. First, lets revisit the basic Bayesian framework. That is, the Bayesian framework used probability models or statements for everything, including data, parameters, and all other unknown quantities. What I’m ultimately interested in is the posterior distribution for the unknown quantities, such as the growth parmeters (theta), variance terms (sigma), and model outputs or predictions (O). This posterior quantifies our uncertainty in these unknowns given our observed data. And, Bayes rule says that this posterior is proportional to the likelihood of all data TIMES the prior distribution for the unknowns. So, once I know the posterior, I can use it to make inferences about the growth parameters, how they vary amoung species, and what the implications are for tree growth and survival and forest dynamics. Now, the likelihood function is really important because it explicitly links the model, data, and parameters. First, let’s consider the FIA inventory, which gives measurements for tree i, of species j, at time t, growing in location s. We input to the model an initial diameter for this tree, a set of species and site-specific parameter values, and some covariates (X), and the model gives a slew of predictions for this particular tree, including height, trunk diameter, biomass totals for each structural compartment, leaf area, sapwood area, and bunch of other stuff. Note though that some of these outputs can be matched-up with FIA data and go into the likelihood, E.g., the observed height and diameter are assumed to come from a normal distribution means that are given by the model outpus. But, the other quantities are not associated with data, but we have some information about these quantities based on previous studies or the literature, and we can use this information to come-up with prior for these unobservable outputs. Now, what I really want are parameter estimates for each species, and I want to account for spatial random effects that might influence these estimates. So, I want to estimate the posterior distribution for theta_js, and I first assume that is can be broken-up into 2 independent pieces: one is the species effect, and one is the random spatial effect. Recall that the literature database provides species-specific parameter estimates, and this also contributes to part of the likelihood. For the spatial effects, I will begin be using a fairly simple model that allows for spatial autocorrelation and that can accommodate the huge, spatially extensive FIA dataset. I will use a conditionally autoregressive model, which assumes that the spatial effect at location s come from a normal distribution with a mean equal to a weighted average of its neighboring locations effects. But, ultimately, what I am really interested in are the theta_tila_j’s – i.e., the species-specific parameter estimates. Observations (data) Latent processes: from isotope mixing model & flux models Functions of parameters  Define process models…

58 The Deconvolution Problem
Theory & Process Models Isotope mixing model (multiple sources & depths) ?? Contributions by source (i ) and depth (z )? Temporal variability? Relative contributions (by source & depth) ?? Source-specific respiration? Spatial & temporal variability? Total flux (at soil surface) Flux model (source- & depth- specific) (Q10 Function, Energy of Activation) From previous “incubation/decomposition” study (Ex 1) Mass profiles (substrate, microbes, roots)

59 (source- & depth- specific) (source-specific parameters)
The Deconvolution Problem Objectives Flux model (source- & depth- specific) Covariate data What is i? (source-specific parameters)  Component fluxes  Total soil flux  Contributions How to estimate i?

60 Bayesian Deconvolution
The Parameter Model (Priors) Example: Lloyd & Taylor (1994) model Informative priors for Eo and To: I am just starting to develop a hierarchical Bayesian melding scheme for merging these data sources and the individual-based growth model. First, lets revisit the basic Bayesian framework. That is, the Bayesian framework used probability models or statements for everything, including data, parameters, and all other unknown quantities. What I’m ultimately interested in is the posterior distribution for the unknown quantities, such as the growth parmeters (theta), variance terms (sigma), and model outputs or predictions (O). This posterior quantifies our uncertainty in these unknowns given our observed data. And, Bayes rule says that this posterior is proportional to the likelihood of all data TIMES the prior distribution for the unknowns. So, once I know the posterior, I can use it to make inferences about the growth parameters, how they vary amoung species, and what the implications are for tree growth and survival and forest dynamics. Now, the likelihood function is really important because it explicitly links the model, data, and parameters. First, let’s consider the FIA inventory, which gives measurements for tree i, of species j, at time t, growing in location s. We input to the model an initial diameter for this tree, a set of species and site-specific parameter values, and some covariates (X), and the model gives a slew of predictions for this particular tree, including height, trunk diameter, biomass totals for each structural compartment, leaf area, sapwood area, and bunch of other stuff. Note though that some of these outputs can be matched-up with FIA data and go into the likelihood, E.g., the observed height and diameter are assumed to come from a normal distribution means that are given by the model outpus. But, the other quantities are not associated with data, but we have some information about these quantities based on previous studies or the literature, and we can use this information to come-up with prior for these unobservable outputs. Now, what I really want are parameter estimates for each species, and I want to account for spatial random effects that might influence these estimates. So, I want to estimate the posterior distribution for theta_js, and I first assume that is can be broken-up into 2 independent pieces: one is the species effect, and one is the random spatial effect. Recall that the literature database provides species-specific parameter estimates, and this also contributes to part of the likelihood. For the spatial effects, I will begin be using a fairly simple model that allows for spatial autocorrelation and that can accommodate the huge, spatially extensive FIA dataset. I will use a conditionally autoregressive model, which assumes that the spatial effect at location s come from a normal distribution with a mean equal to a weighted average of its neighboring locations effects. But, ultimately, what I am really interested in are the theta_tila_j’s – i.e., the species-specific parameter estimates.

61 Implementation Markov chain Monte Carlo (MCMC) WinBUGS
Sample parameters (θi ) from posterior Posteriors for: θi’s, ri(z,t)’s, pi(z,t)’s, etc. Means, medians, uncertainty WinBUGS

62 Results: Dynamic Source Contributions San Pedro Site – Monsoon Season
Zoom-in

63 Results: Root Respiration Responses
Zoom-in: July 27 – August 4 Date Total root respiration (umol m-2 s-1) Soil water (v/v) Rain (mm) Mesquite (C3 shrub) Sacaton (C4 grass) Soil water I am just starting to develop a hierarchical Bayesian melding scheme for merging these data sources and the individual-based growth model. First, lets revisit the basic Bayesian framework. That is, the Bayesian framework used probability models or statements for everything, including data, parameters, and all other unknown quantities. What I’m ultimately interested in is the posterior distribution for the unknown quantities, such as the growth parmeters (theta), variance terms (sigma), and model outputs or predictions (O). This posterior quantifies our uncertainty in these unknowns given our observed data. And, Bayes rule says that this posterior is proportional to the likelihood of all data TIMES the prior distribution for the unknowns. So, once I know the posterior, I can use it to make inferences about the growth parameters, how they vary amoung species, and what the implications are for tree growth and survival and forest dynamics. Now, the likelihood function is really important because it explicitly links the model, data, and parameters. First, let’s consider the FIA inventory, which gives measurements for tree i, of species j, at time t, growing in location s. We input to the model an initial diameter for this tree, a set of species and site-specific parameter values, and some covariates (X), and the model gives a slew of predictions for this particular tree, including height, trunk diameter, biomass totals for each structural compartment, leaf area, sapwood area, and bunch of other stuff. Note though that some of these outputs can be matched-up with FIA data and go into the likelihood, E.g., the observed height and diameter are assumed to come from a normal distribution means that are given by the model outpus. But, the other quantities are not associated with data, but we have some information about these quantities based on previous studies or the literature, and we can use this information to come-up with prior for these unobservable outputs. Now, what I really want are parameter estimates for each species, and I want to account for spatial random effects that might influence these estimates. So, I want to estimate the posterior distribution for theta_js, and I first assume that is can be broken-up into 2 independent pieces: one is the species effect, and one is the random spatial effect. Recall that the literature database provides species-specific parameter estimates, and this also contributes to part of the likelihood. For the spatial effects, I will begin be using a fairly simple model that allows for spatial autocorrelation and that can accommodate the huge, spatially extensive FIA dataset. I will use a conditionally autoregressive model, which assumes that the spatial effect at location s come from a normal distribution with a mean equal to a weighted average of its neighboring locations effects. But, ultimately, what I am really interested in are the theta_tila_j’s – i.e., the species-specific parameter estimates. Jul 27 Aug 4

64 Total root respiration
Results: Contributions Vary by Depth Date Total root respiration (umol m-2 s-1) Soil water (v/v) Mesquite (C3 shrub) Soil water Sacaton (C4 grass) Day 210 Day 213 Day 216 0-5 5-10 10-15 15-20 20-25 25-30 30-40 40-50 Depth (cm) Relative contributions by depth I am just starting to develop a hierarchical Bayesian melding scheme for merging these data sources and the individual-based growth model. First, lets revisit the basic Bayesian framework. That is, the Bayesian framework used probability models or statements for everything, including data, parameters, and all other unknown quantities. What I’m ultimately interested in is the posterior distribution for the unknown quantities, such as the growth parmeters (theta), variance terms (sigma), and model outputs or predictions (O). This posterior quantifies our uncertainty in these unknowns given our observed data. And, Bayes rule says that this posterior is proportional to the likelihood of all data TIMES the prior distribution for the unknowns. So, once I know the posterior, I can use it to make inferences about the growth parameters, how they vary amoung species, and what the implications are for tree growth and survival and forest dynamics. Now, the likelihood function is really important because it explicitly links the model, data, and parameters. First, let’s consider the FIA inventory, which gives measurements for tree i, of species j, at time t, growing in location s. We input to the model an initial diameter for this tree, a set of species and site-specific parameter values, and some covariates (X), and the model gives a slew of predictions for this particular tree, including height, trunk diameter, biomass totals for each structural compartment, leaf area, sapwood area, and bunch of other stuff. Note though that some of these outputs can be matched-up with FIA data and go into the likelihood, E.g., the observed height and diameter are assumed to come from a normal distribution means that are given by the model outpus. But, the other quantities are not associated with data, but we have some information about these quantities based on previous studies or the literature, and we can use this information to come-up with prior for these unobservable outputs. Now, what I really want are parameter estimates for each species, and I want to account for spatial random effects that might influence these estimates. So, I want to estimate the posterior distribution for theta_js, and I first assume that is can be broken-up into 2 independent pieces: one is the species effect, and one is the random spatial effect. Recall that the literature database provides species-specific parameter estimates, and this also contributes to part of the likelihood. For the spatial effects, I will begin be using a fairly simple model that allows for spatial autocorrelation and that can accommodate the huge, spatially extensive FIA dataset. I will use a conditionally autoregressive model, which assumes that the spatial effect at location s come from a normal distribution with a mean equal to a weighted average of its neighboring locations effects. But, ultimately, what I am really interested in are the theta_tila_j’s – i.e., the species-specific parameter estimates.

65 Summary Sources of soil CO2 efflux
Mesquite (shrub): major contributor, stable source Sacton (grass): minor contributor, threshold response Microbes (bare): minor contributor, coupled to pulses Deconvolution & data-model integration Soil depth (including litter) By species or functional groups Quantify spatial & temporal variability Incorporate environmental drivers Implications & applications Identify mechanisms Predictions & forward modeling

66 Outline The process model:
Types of ecological models Building process models Examples from belowground ecosystem ecology: Motivating issues Ex 1: Estimating components of soil organic matter decomposition Ex 2: Deconvolution of soil respiration (i.e., CO2 efflux) In both examples, highlight: Data sources Process models Data-model integration Implications of data-model integration for sensor network data & applications

67 Implications for Sensor Networks
Parameter estimation (or “model parameterization”) Process models related to “biogeochemical exchanges between the atmosphere & biosphere” Quantification of uncertainty Improved predictions and forecasts Synthesize data Go beyond simple “classical” analyses Explicit integration of multiple data types & scales Relate different system components to each other Learn about important mechanisms Hypothesis generation & sampling design Use data-informed models to generate testable hypotheses Inform sampling and network design Where (spatial), when (temporal), what (components)?

68 Questions? Photo by Travis Huxman
Monsoon flood, San Pedro River Basin; Sonoran desert

69

70

71 Results: Dynamic Source Contributions

72 Example WinBUGS Output
EO TO

73 Fractional contributions Substrate or root profiles
The Inverse Problem Plant water uptake Soil respiration Isotope mixing model ?? Fractional contributions Total flux Flux model (Q10 Function, Energy of Activation) Substrate or root profiles

74 The Inverse Problem Isotope mixing model (multiple sources & depths) ?? Contributions by source (i ) and depth (z )? Temporal variability? Relative contributions (by source & depth) ?? Source-specific respiration? Spatial & temporal variability? Total flux (at soil surface) Flux model (source- & depth- specific) (Q10 Function, Energy of Activation) Mass profiles (substrate, microbes, roots)

75 The Deconvolution Problem
Data-Model Integration Flux model (source- & depth- specific) Covariate data What is i? (source-specific parameters)  Total soil flux  Contributions Likelihood of data (isotopes & soil flux) Depend on i From isotope mixing model & flux models

76 (arid systems; total mass, heterotrophic activity)
Data Source Examples stochastic data Literature data Pool Isotopes (δ13Ci) (roots, soil, litter; Keeling plots) Soil Isotopes (δ13CTot) (automated chambers & Keeling plots) Litter (arid systems; total mass, carbon, microbes) Soil CO2 flux (automated chambers) Soil CO2 flux (manual chambers) Root mass (arid systems; total mass) Soil samples (carbon content, C:N, root mass) Microbial mass (arid systems; total mass) Soil incubations (root-free, carbon substrate, microbial mass, heterotrophic activity) Root respiration (in situ gas exchange) Soil temp & water (automated, multiple locations, many depths) covariate data Root distributions (arid systems, different functional types) Soil carbon (arid systems; total C) Root respiration (arid systems, different functional types) I am just starting to develop a hierarchical Bayesian melding scheme for merging these data sources and the individual-based growth model. First, lets revisit the basic Bayesian framework. That is, the Bayesian framework used probability models or statements for everything, including data, parameters, and all other unknown quantities. What I’m ultimately interested in is the posterior distribution for the unknown quantities, such as the growth parmeters (theta), variance terms (sigma), and model outputs or predictions (O). This posterior quantifies our uncertainty in these unknowns given our observed data. And, Bayes rule says that this posterior is proportional to the likelihood of all data TIMES the prior distribution for the unknowns. So, once I know the posterior, I can use it to make inferences about the growth parameters, how they vary amoung species, and what the implications are for tree growth and survival and forest dynamics. Now, the likelihood function is really important because it explicitly links the model, data, and parameters. First, let’s consider the FIA inventory, which gives measurements for tree i, of species j, at time t, growing in location s. We input to the model an initial diameter for this tree, a set of species and site-specific parameter values, and some covariates (X), and the model gives a slew of predictions for this particular tree, including height, trunk diameter, biomass totals for each structural compartment, leaf area, sapwood area, and bunch of other stuff. Note though that some of these outputs can be matched-up with FIA data and go into the likelihood, E.g., the observed height and diameter are assumed to come from a normal distribution means that are given by the model outpus. But, the other quantities are not associated with data, but we have some information about these quantities based on previous studies or the literature, and we can use this information to come-up with prior for these unobservable outputs. Now, what I really want are parameter estimates for each species, and I want to account for spatial random effects that might influence these estimates. So, I want to estimate the posterior distribution for theta_js, and I first assume that is can be broken-up into 2 independent pieces: one is the species effect, and one is the random spatial effect. Recall that the literature database provides species-specific parameter estimates, and this also contributes to part of the likelihood. For the spatial effects, I will begin be using a fairly simple model that allows for spatial autocorrelation and that can accommodate the huge, spatially extensive FIA dataset. I will use a conditionally autoregressive model, which assumes that the spatial effect at location s come from a normal distribution with a mean equal to a weighted average of its neighboring locations effects. But, ultimately, what I am really interested in are the theta_tila_j’s – i.e., the species-specific parameter estimates.

77 Fractional contributions Substrate or root profiles
The Deconvolution Problem Plant water uptake Soil respiration Isotope mixing model ?? Fractional contributions Total flux These water uptake, soil respiration, and isotopic mixing processes can be expressed as an inverse problem. First, note that the stem isotope signatures at time t integrate over the soil isotope signatures at depth z TIMES the contribution of depth z to water uptake from that depth. We integrate over all depths from which roots are aquiring water, and this gives us the stem signature. Likewise, the C-13 signature of the respired CO2 sums over all heterotrophic and autotrophic sources times their fractional contributions. Note that that the fractional contribution of each source is determined by taking each source’s contribution at depth z and integrating it over all depths.These equations describe the isotope mixing model, and you should notice that they are very similar to the simple linear mixing model equation, which sums over potential sources. Note that there are certain components of the mixing model that we can measure: the isotopic signatures of the stem water, soil water at different depths, the respired CO2, and the source signatures. But, what we don’t know, and what we really want to tease-apart are the fractional contributions. I.e., the contribution of each soil layer to plant water uptake (or – the plant water sources) and the contributions of each source and different soil layers to total soil respiration. Thus, this is an inverse problem because the reponse variable that we can measure is of lower dimension or contains less information that the thing inside the integral, and we essentially need to invert this integral in order to solve for the higher dimensional variables –i.e., the fractional contributions. Thus, just as in the simple linear mixing model, we wish to estimate the p’s and q’s, but we are going to do this using a mechanistic modeling framework. First, note that the fractional contribution of different soil layers to plant water uptake is simply the amount of water taken-up from layer z divided by the total amount of water uptake. And, the fractional contribution of different sources and depths to soil respiration is simply the repiration rate of source i at depth z times the amount or mass of source i at depth z, divided by total respiration. Note then that total water uptake is simply determined by integrating (or summing) water uptake over all depths from which water is acquired, and total respiration is determined by summing over all sources and integrating over all depths the total respiration at of each source at each depth. Note that we can measure components of the contributions and total fluxes. E.g., we can collect soil cores and get estimates of the mass distributions, and we can use soil respiration chambers to measure CO2 flux at the soil surface. But, we can’t directly measure mass-specific respiration rate of each source at each depth, and we can’t directly measure water acquisition at all depths. But, I assume that we can use some sort of biophysical or semi-mechanistic model to describe both U and little r. For example, here’s a simplified version of a relativley simple water uptake model that I employed. This part of the uptake model describes a driving gradient for water uptake as given be differences between the water potentials and hydraulic conductivitis of the bulk soil and at the root surface. But, water uptake also depends on the active root area for water uptake, as shown by this term that includeds RA (active root area). Similary, we can use some sort of model to describe mass-specific respiration, and examples of such models include the commonly used Q10 function or the Energy of activity model, both of which describe the dependency of respiration on temperature. I am currently exploring different respiration models, but in all cases, I’m incorporating the effects of soil water availability, and of course temperature on respiration rates. The function is defined by a set of parameters (theta) that capture differences between sources and that reflect the sensitivity of respiration to changes in temperature or soil water. Note that once again, we can estimate components of the flux models with field data. For, example measurements of soil water content and soil texture can be used to estimate the driving gradient for water uptake, and soil water content in addition to soil temperature can also be measured as part of the respiration model. But, you should notice, that at least for the water uptake model, we still don’t know an important component, and that is the active root area profile. THis is something that we can’t directly measure in the field. The analogy to the respiration model are the mass or substrate profiles, but as I mentioned earlier, we can estimate these from field data. So, what to do about the active root area profile? I assume that But, again, we can assume some sort of model to describe RA, and I’m going to assume that the active root area profile can be described by a mixutre of gamma densities. I chose a gamma mixture model because it is extremely flexible and can capture an infinite number of shapes that are consistent with rooting profiles that have been measured in the field, including unimodal, bimodal and relatively uniform profiles. This gamma mixture has 5 unknown parameters (omega, alpha1, ....). Flux model (Q10 Function, Energy of Activation) Substrate or root profiles

78 From isotope mixing model & flux model
The Deconvolution Problem Plant water uptake Soil respiration What are ω, a1, m1, a2, m2? What is i? Thus, the goal in both of these problems is to estimate the unknown quantities and parameters including the gamma mixture parameters omega, alpha1, mu1, alpha2, mu2, and the source- or substrate-specific respiration parameters as depicted by theta i. Once we know these parameters, they will tell us the active root area profile, with will give up the water uptake profile, which will ultimate tell us the fractional contributions of the different soil depths to plant water uptake. Likewise, once we know theta, then we have an estimate of the mass-specific repiration rates, which gives us total soil respiration, both of which tell use the fractional contribution of different sources and soil layers to total soil respiration. We get estimates of these parameter by fitting field data to the isotope mixing and flux models that I described in the previous slides. In particular, a simple example is to assume that the observed stem signatures, soil CO2 signature and total soil respiration data come from normal distributions whose means are defined by the isotope mixing models and the flux models. From isotope mixing model & flux model Likelihood of data

79 Types of data provides by sensor networks
high-frequency tunable diode laser (TDL) measurement of the stable isotope eddy covariance for measuring concentrations and fluxes of gases (e.g., water vapor and CO2) soil environmental data: temperature, water content, water potential, etc. micro-met data: air temp, RH, vpd, light, wind speed, etc. plant ecophys/ecosystem data: sapflux, ET, albedo & reflectance

80 data-model integration
Key components Data I am employing an integrative approach in developing the mechanistic framework and for addressing important and interesting ecological questions. A key element of my appraoch is the strong integration of data and models. I incorporate as much information as possible by conducting targeted field experiments and using large databases and existing data from the literature. I develop and employ mathematic and simulation models that explicitly translate what I see are the key ecological processes associated with a particular problem into a series of equations. Importantly, I use the models as a way to analyze the disparate data sources, and I couple the models and data via statitical modeling methods. I tend to use Bayesian approaches because they are particularly appealing for merging large datasets from diverse sources with process-based models because they are relatively straightforward to implement and can easily accommodate prior information such as that from the literature. This integrative approach greatly helps to tease apart complicated plant-environment interactions and facilitates linking processes operating at different scales. P(q | X ) q Process models Statistical tools data-model integration

81 The Process Model Conceptual models: Model formulation:
Systems diagrams Graphical models Model formulation: Explicit, mathematical eqn’s Systems equations State-space equations Observations of real system Conceptual model Mathematical model Analytical output “Compare” Observational data Simulation model Numerical/ simulation output

82 Examples Presented Today
Deterministic Stochastic Compartment models (differential or difference equn’s) Matrix models Reductionist models (include lots of details & components) Holistic models (use general principles) Static models Dynamic models (implicit dependence on time) Distributed models (implicit dependence on space & time) Lumped models Linear models Nonlinear models Causal/mechanistic models Black box models Analytical models Numerical/simulation models Jorgensen (1986) Fundamentals of Ecological Modelling. 389 pp. Elsevier, Amsterdam.

83 Data Model (Likelihood)
Likelihood components Assuming conditional independence, likelihood of all data is:


Download ppt "Data-model integration: Examples from belowground ecosystem ecology"

Similar presentations


Ads by Google