State-Space Models for Biological Monitoring Data Devin S. Johnson University of Alaska Fairbanks and Jennifer A. Hoeting Colorado State University.

Slides:



Advertisements
Similar presentations
Ordinary Least-Squares
Advertisements

Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Hierarchical Dirichlet Processes
VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.
Combining Information from Related Regressions Duke University Machine Learning Group Presented by Kai Ni Apr. 27, 2007 F. Dominici, G. Parmigiani, K.
1 Colorado State University’s EPA-FUNDED PROGRAM ON SPACE-TIME AQUATIC RESOURCE MODELING and ANALYSIS PROGRAM (STARMAP) Jennifer A. Hoeting and N. Scott.
An Overview STARMAP Project I Jennifer Hoeting Department of Statistics Colorado State University
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Robust sampling of natural resources using a GIS implementation of GRTS David Theobald Natural Resource Ecology Lab Dept of Recreation & Tourism Colorado.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
Bayesian Models for Radio Telemetry Habitat Data Megan C. Dailey* Alix I. Gitelman Fred L. Ramsey Steve Starcevich * Department of Statistics, Colorado.
1 STARMAP: Project 2 Causal Modeling for Aquatic Resources Alix I Gitelman Stephen Jensen Statistics Department Oregon State University August 2003 Corvallis,
State-Space Models for Within-Stream Network Dependence William Coar Department of Statistics Colorado State University Joint work with F. Jay Breidt This.
Semiparametric Mixed Models in Small Area Estimation Mark Delorey F. Jay Breidt Colorado State University September 22, 2002.
Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
Habitat selection models to account for seasonal persistence in radio telemetry data Megan C. Dailey* Alix I. Gitelman Fred L. Ramsey Steve Starcevich.
What is a Multi-Scale Analysis? Implications for Modeling Presence/Absence of Bird Species Kathryn M. Georgitis 1, Alix I. Gitelman 1, Don L. Stevens 1,
1 Accounting for Spatial Dependence in Bayesian Belief Networks Alix I Gitelman Statistics Department Oregon State University August 2003 JSM, San Francisco.
PAGE # 1 Presented by Stacey Hancock Advised by Scott Urquhart Colorado State University Developing Learning Materials for Surface Water Monitoring.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to Improve Local Estimation for Mapping Aquatic Resources Kerry J. Ritter Molly Leecaster.
Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.
Chapter 11 Multiple Regression.
Habitat association models  Independent Multinomial Selections (IMS): (McCracken, Manly, & Vander Heyden, 1998) Product multinomial likelihood with multinomial.
PAGE # 1 STARMAP OUTREACH Scott Urquhart Department of Statistics Colorado State University.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Continuous Random Variables and Probability Distributions
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Optimal Sample Designs for Mapping EMAP Data Molly Leecaster, Ph.D. Idaho National Engineering & Environmental Laboratory Jennifer Hoeting, Ph. D. Colorado.
Statistical Models for Stream Ecology Data: Random Effects Graphical Models Devin S. Johnson Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Spatial and Spatio-temporal modeling of the abundance of spawning coho salmon on the Oregon coast R Ruben Smith Don L. Stevens Jr. September.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
● Final exam Wednesday, 6/10, 11:30-2:30. ● Bring your own blue books ● Closed book. Calculators and 2-page cheat sheet allowed. No cell phone/computer.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
DAMARS/STARMAP 8/11/03# 1 STARMAP YEAR 2 N. Scott Urquhart STARMAP Director Department of Statistics Colorado State University Fort Collins, CO
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
A generalized bivariate Bernoulli model with covariate dependence Fan Zhang.
Lecture 2: Statistical learning primer for biologists
Probabilistic models Jouni Tuomisto THL. Outline Deterministic models with probabilistic parameters Hierarchical Bayesian models Bayesian belief nets.
Global Analyzing community data with joint species distribution models abundance, traits, phylogeny, co-occurrence and spatio-temporal structures Otso.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Continuous Random Variables and Probability Distributions
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Multi-label Prediction via Sparse Infinite CCA Piyush Rai and Hal Daume III NIPS 2009 Presented by Lingbo Li ECE, Duke University July 16th, 2010 Note:
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
VARYING DEVIATION BETWEEN H 0 AND TRUE  SEQUENCE OF GRAPHS TO ILLUSTRATE POWER VARYING DEVIATION BETWEEN H 0 AND TRUE  N. Scott Urquhart Director, STARMAP.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
Bayesian Semi-Parametric Multiple Shrinkage
Bayesian data analysis
Estimation and Model Selection for Geostatistical Models
Distributions and Concepts in Probability Theory
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Kernel Stick-Breaking Process
Hidden Markov Models Part 2: Algorithms
Chapter 11: Inference for Distributions of Categorical Data
Shashi Shekhar Weili Wu Sanjay Chawla Ranga Raju Vatsavai
Stochastic Methods.
Presentation transcript:

State-Space Models for Biological Monitoring Data Devin S. Johnson University of Alaska Fairbanks and Jennifer A. Hoeting Colorado State University

# CR The work reported here was developed under the STAR Research Assistance Agreement CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of presenter and the STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this presentation.

Outline Biological monitoring data Previous methods Bayesian hierarchical models previous models Multiple trait models Continuous trait models Analysis of fish functional traits

Biological Monitoring Data Organisms are sampled at several sites. “Individuals” are classified according to a set  of traits Example: Individual response vector: Longevity 1.  6 years 2. > 6 years Trophic guild 1. herbivore 2. omivore 3. invertivore 4. picsivore

Environmental Conditions A set, , of site specific environmental measurements (covariates) are also typically recorded. Example: stream order, watershed area, elevation Let Denote the vector of environmental covariates for a single sampling site

Functional Trait vs. Species Analysis Distributions of functional traits are often more interesting Species are geographically constrained Limited ecological inference Analysis of functional traits is portable Functional traits allow inference of the biological root of species distribution and environmental adaptation.

Previous Functional Trait Analysis Methods Ordination methods Canonical Correspondence Analysis (CCA) (ter Braak, 1985) Ordinate traits along a set of environmental axes Product moment correlations “Solution to the 4 th Corner Problem” (Legendre et al. 1997) Estimate correlation measure between trait counts and environmental covariates

Shortcomings of previous methods Measure marginal association between environmental variables and traits Conditional relationships give a more detailed measure of association Interaction between traits can give a different view No predictive ability Cannot predict community structure at a site using remotely sensed covariates (GIS)

State-space models for a single trait Billhiemer and Guttorp (1997) C si = number of individuals belonging to category i at site s = 1,…, S x s = site specific environmental covariate Parameter estimation using a Gibbs sampler

Extending the Billheimer-Guttorp model Generalize the BG model to explicitly allow for multiple trait inference Allow for a range of trait interaction Parameterize to allow parsimonious modeling BG model based on random effect categorical data models Use graphical model structure with random effects Allow inference for trait interactions

Multiple trait analysis Notation: i Realization of Y  cell) I Sample space of Y  not necessarily I 1 ×…× I |  | ) P si Probability density of Y at site s = 1,…, S (cell probability) C si number of individuals of type i at site s (cell count)  Single trait (    ) a Subset of traits ( a   )

Bayesian hierarchical model Data model: where Parameter model:

and  measure interaction between the traits in a For model identifiability choose reference cell i * and set: Interaction parameterization

Conditional independence statements If I = I 1 ×…× I |  |, then if For certain model specifications: if

Interaction example Data:  = {1, 2}; Y = {Y 1, Y 2 }; no covariates Saturated model: Conditional independence model: implies Y 1  Y 2 |  and in this case Y 1  Y 2 )

Continuous traits In addition, for each individual, a set, , of continuous traits are measured Example: Shape = Body length / Body depth (How hydrodynamic is the individual?) Individual response vector: Y   R |  | is a vector of interval valued traits (i, y) represents a realization of Y

Conditional Gaussian distribution The conditional Gaussian distribution (Lauritzen, 1996, Graphical Models) η (a) and Ψ (a) measure interactions between discrete and continuous traits Homogeneous CG: = 0 for a  

Random effects CG Regression where, Reference cell identifiability constraints imposed Conditional independence inferred from zero-valued parameters and random effects

RECG Hierarchical model However, note the simplification

Parameter estimation A Gibbs sampling approach is used for parameter estimation Analyze ( , , T) and (η, Ψ, , K) with 2 separate Gibbs chains The CG to Mult×N formulation of the likelihood Independent priors for ( , , T) and (η, Ψ, , K) Problem: Rich random effects structure can lead to poor convergence Solution: Hierarchical centering

Hierarchical centering  (a),  (a), T a and K a have closed form full conditional distributions  and  need to be updated with a Metropolis step in the Gibbs sampler.

Fish species trait richness 119 stream sites visited in an EPA EMAP study Discrete traits: Continuous trait: Shape factor = Body length / Body depth i * = (  6 years, Herbivore) = (1, 1) Longevity 1.  6 years 2. > 6 years Trophic guild 1. herbivore 2. omivore 3. invertivore 4. picsivore

Stream covariates Environmental covariates: values were measured at each site for the following covariates 1. Stream order 2. Minimum watershed elevation 3. Watershed area 4. % area impacted by human use 5. Areal % fish cover

Fish trait richness model Interaction models: Random effects:

Environment effects on Longevity Table 1. Comparison of “null” model to model including specified covariate for Longevity. Values presented are ≈2ln(BF). Covariate> 6 years Stream order (  ) Elevation4.139 Area Use7.319 Fish cover5.032

Environmental effects on Trophic Guild Table 2. Comparison of “null” model to model including specified covariate for trophic guild. Values presented are ≈2ln(BF). Trophic Guild CovariateOmnivoreInvertivorePiscivore Stream order Elevation (  ) Area Use (  ) Fish cover

Environmental effects on Trophic Guild Table 3. Comparison of “null” model to model including specified covariate for Shape. Values presented are ≈2ln(BF). CovariateShape Stream order8.116 Elevation8.228 Area8.225 % impacted Fish cover7.636

Trait interaction Table 4. Comparison of “null” model to model including interaction parameters. Interaction2ln(BF) {L, T} {L, S} {T, S} {L, T, S}

Comments / Conclusions Multiple traits can be analyzed with specified interaction Continuous traits can also be included Markov Random Field interpretation for trait interactions BG model obtainable for multi-way traits Allow full interaction and correlated R.E. MVN random effects imply that the cell probabilities have a “constrained” LN distribution