Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.

Similar presentations


Presentation on theme: "Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado."— Presentation transcript:

1 Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado State University Department of Statistics, Colorado State University *Department of Civil Engineering, Colorado State University A spatial model for ordered categorical data

2 Substrate size in streams ► Influences in-stream physical habitat ► Often indicative of stream health ► EPA collected data at 485 sites in Washington and Oregon between 1994 and 2004

3 Data Collection Protocol ► At a site:  11 transects x 5 points along each transect  Choose particle under the sharp end of a stick  Visually estimate and classify size

4 Creating the response ► For a site:  Transform the original size classes to log 10 (Geometric Mean) for all sample points  Find the median for the site ► Geometric mean

5 The response ► Y i = median[log 10 (geometric mean)] for site i ► Transformation provides a more symmetric, continuous-like variable  Typically modeled as a continuous variable  Predictive models have performed poorly ► Response is an ordered categorical variable  12 categories (6 with very few observations)

6 Ordered categorical data ► Y i is a categorical response variable with K ordered values: {1,…,K} ► Modeling objectives:   Explain the variation in the ordered response from covariate(s)   Incorporate the spatial dependence   Estimate, predict, and create maps of Pr(Y i ≤ k) and Pr(Y i = k)

7 Formulating the spatial model Spatial model for ordered categorical data += Non-spatial model for ordered categorical data Albert & Chib (1993, 1997) Spatial model for binary and count data Diggle, Tawn, & Moyeed (1998) Gelfand & Ravishanker (1998) Generalized geostatistical models with a latentGeneralized geostatistical models with a latent Gaussian process Metropolis Hastings within Gibbs samplingMetropolis Hastings within Gibbs samplingapproach

8 Latent variable formulation ► Define latent variable, Z i, such that Z i = X i ’β + ε i  ε i ~ N(0,1) for the probit model  ε i ~ Standard Logistic for logit model ► Define the categorical response, Y i = {1,…,K}, using Z i and ordered cut-points, θ = (θ 1, …,θ K-1 ), where 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞ where 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞ Y i = 1 if Z i < θ 1 Y i = 1 if Z i < θ 1 Y i = k if θ k-1 ≤ Z i < θ k Y i = k if θ k-1 ≤ Z i < θ k Y i = K if Z i ≥ θ K-1 Y i = K if Z i ≥ θ K-1

9 Latent variable formulation ► Thus, Pr(Y i ≤ k | θ, β) = Pr(Z i < θ k ) Pr(Y i = k | θ, β) = Pr(θ k-1 ≤ Z i < θ k ) Pr(Y i = k | θ, β) = Pr(θ k-1 ≤ Z i < θ k )  If Z ~ N(X i ’β, 1), then Pr(Y i ≤ k | θ, β) = Φ(θ k – X i ’β) Pr(Y i ≤ k | θ, β) = Φ(θ k – X i ’β) Pr(Y i = k | θ, β) = Φ(θ k – X i ’β) - Φ(θ k-1 – X i ’β) Pr(Y i = k | θ, β) = Φ(θ k – X i ’β) - Φ(θ k-1 – X i ’β) where Φ is the N(0,1) cdf

10 Spatial cumulative model ► Z i = X i ’β + W i + ε i is the latent variable  where ε i ~ N(0,1) W ~ N(0,  2 H(  )) (H(  )) ij =  (s i -s j ;  ) W ~ N(0,  2 H(  )) (H(  )) ij =  (s i -s j ;  ) Z i | β, W i ~ N(X i ’β + W i, 1) ► Pr(Y i ≤ k | β, θ, W i ) = Pr(Z i < θ k ) = Φ (θ k – X i ’β - W i ) = Φ (θ k – X i ’β - W i ) Where θ = (θ 1, …,θ K ) is a vector of cut-points such that 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞ such that 0 = θ 1 < θ 2 < … < θ K-1 < θ K = ∞

11 Fitting the spatial model ► The likelihood ► Estimating  = (  0,  1 ),  = (  2,  ), θ = (θ 2, …,θ K-1 ) ► Transform θ to a real-valued, unrestricted cut-points:  = (      where    log(θ 2 )  k   log(θ k – θ k-1 )  k   log(θ k – θ k-1 ) ► MCMC sampling  Metropolis-Hastings within Gibbs sampling  Prior: ►  – flat and conjugate Normal ►  2 and  – Independent uniform priors ►  multivariate normal

12 Simulated data ► Simulated data at a subset of the original locations (n = 82)  Cluster infill around the 82 sites (n=120)  Spatial process: ► W is a stationary Gaussian process with E[W(s)]=0 and Cov[W(s i ),W(s j )] =  2  (s i -s j ;  ) ► Exponential correlation function:  (d) = exp(-d  )  Covariate: ► Distance weighted stream power

13 Preliminary Results ► Posterior quantities  Based on 1000 iterations (burn-in = 1000)

14 Posterior mean of the spatial process

15 Posterior SD of the spatial process

16 Posterior mean and SD for Pr(Y i = 2)

17 Posterior mean and SD for Pr(Y i = 5)

18 Posterior mean and SD for Pr(Y i ≤ 5)

19 Future Work ► Convergence and mixing for the spatial model ► Models and methods for large data sets  Spectral parameterization of the spatial process ► Wikle (2002), Paciorek & Ryan (2005), Royle & Wikle (2005)  Importance sampling ► Gelfand & Ravishanker (1998), Gelfand, Ravishanker, & Ecker (2000)  Sub-sampling ► Investigate different spatial correlation functions and distance metrics  Traditional  Stream based ► Model selection for the spatial model

20 Funding and Affiliations FUNDING/DISCLAIMER The work reported here was developed under the STAR Research Assistance Agreement CR-829095 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of the authors and STARMAP, the Program they represent. EPA does not endorse any products or commercial services mentioned in this presentation. Megan’s research is also partially supported by the PRIMES National Science Foundation Grant DGE-0221595003. CR-829095

21 Thank you

22

23 Subset of data (n small = 82)

24 Sample path plot - Example

25 Surface for estimating  =(  2,  )

26 Sample path plot – Avoiding plateau


Download ppt "Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado."

Similar presentations


Ads by Google