Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado.

Slides:



Advertisements
Similar presentations
MCMC estimation in MlwiN
Advertisements

Chapter 2 Describing Contingency Tables Reported by Liu Qi.
VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.
Predicting the likelihood of water quality impaired stream reaches using landscape scale data and a hierarchical methodology Erin Peterson Geosciences.
# 1 METADATA: A LEGACY FOR OUR GRANDCHILDREN N. Scott Urquhart STARMAP Program Director Department of Statistics Colorado State University.
1 Colorado State University’s EPA-FUNDED PROGRAM ON SPACE-TIME AQUATIC RESOURCE MODELING and ANALYSIS PROGRAM (STARMAP) Jennifer A. Hoeting and N. Scott.
An Overview STARMAP Project I Jennifer Hoeting Department of Statistics Colorado State University
Multi-Lag Cluster Enhancement of Fixed Grids for Variogram Estimation for Near Coastal Systems Kerry J. Ritter, SCCWRP Molly Leecaster, SCCWRP N. Scott.
# 1 STATISTICAL ASPECTS OF COLLECTIONS OF BEES TO STUDY PESTICIDES N. SCOTT URQUHART SENIOR RESEARCH SCIENTIST DEPARTMENT OF STATISTICS COLORADO STATE.
Loglinear Models for Contingency Tables. Consider an IxJ contingency table that cross- classifies a multinomial sample of n subjects on two categorical.
Robust sampling of natural resources using a GIS implementation of GRTS David Theobald Natural Resource Ecology Lab Dept of Recreation & Tourism Colorado.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
Bayesian Models for Radio Telemetry Habitat Data Megan C. Dailey* Alix I. Gitelman Fred L. Ramsey Steve Starcevich * Department of Statistics, Colorado.
1 STARMAP: Project 2 Causal Modeling for Aquatic Resources Alix I Gitelman Stephen Jensen Statistics Department Oregon State University August 2003 Corvallis,
State-Space Models for Within-Stream Network Dependence William Coar Department of Statistics Colorado State University Joint work with F. Jay Breidt This.
Linking watersheds and streams through functional modeling of watershed processes David Theobald, Silvio Ferraz, Erin Poston, and Jeff Deems Natural Resource.
Semiparametric Mixed Models in Small Area Estimation Mark Delorey F. Jay Breidt Colorado State University September 22, 2002.
Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.
Statistics for Business and Economics
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
Habitat selection models to account for seasonal persistence in radio telemetry data Megan C. Dailey* Alix I. Gitelman Fred L. Ramsey Steve Starcevich.
1 Accounting for Spatial Dependence in Bayesian Belief Networks Alix I Gitelman Statistics Department Oregon State University August 2003 JSM, San Francisco.
Quantifying fragmentation of freshwater systems using a measure of discharge modification (and other applications) David Theobald, John Norman, David Merritt.
Strength of Spatial Correlation and Spatial Designs: Effects on Covariance Estimation Kathryn M. Irvine Oregon State University Alix I. Gitelman Sandra.
PAGE # 1 Presented by Stacey Hancock Advised by Scott Urquhart Colorado State University Developing Learning Materials for Surface Water Monitoring.
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
Quantifying fragmentation of freshwater systems using a measure of discharge modification (and other applications) David Theobald, John Norman, David Merritt.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to Improve Local Estimation for Mapping Aquatic Resources Kerry J. Ritter Molly Leecaster.
Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.
Habitat association models  Independent Multinomial Selections (IMS): (McCracken, Manly, & Vander Heyden, 1998) Product multinomial likelihood with multinomial.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 14 Goodness-of-Fit Tests and Categorical Data Analysis.
PAGE # 1 STARMAP OUTREACH Scott Urquhart Department of Statistics Colorado State University.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
State-Space Models for Biological Monitoring Data Devin S. Johnson University of Alaska Fairbanks and Jennifer A. Hoeting Colorado State University.
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Optimal Sample Designs for Mapping EMAP Data Molly Leecaster, Ph.D. Idaho National Engineering & Environmental Laboratory Jennifer Hoeting, Ph. D. Colorado.
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
Statistical Models for Stream Ecology Data: Random Effects Graphical Models Devin S. Johnson Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
1 Learning Materials for Surface Water Monitoring Gerald Scarzella.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
1 Spatial and Spatio-temporal modeling of the abundance of spawning coho salmon on the Oregon coast R Ruben Smith Don L. Stevens Jr. September.
1 Dr. Jerrell T. Stracener EMIS 7370 STAT 5340 Probability and Statistics for Scientists and Engineers Department of Engineering Management, Information.
Multinomial Distribution
Simulation of the matrix Bingham-von Mises- Fisher distribution, with applications to multivariate and relational data Discussion led by Chunping Wang.
Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.
DAMARS/STARMAP 8/11/03# 1 STARMAP YEAR 2 N. Scott Urquhart STARMAP Director Department of Statistics Colorado State University Fort Collins, CO
Logistic regression. Recall the simple linear regression model: y =  0 +  1 x +  where we are trying to predict a continuous dependent variable y from.
PAGE # 1 EaGLes Conference EaGLes Conference December 3, 2002 CASE STUDY 6: INDICATORS OF MARSH MAINTENANCE OF ELEVATION comments by N. Scott Urquhart.
1 Follow the three R’s: Respect for self, Respect for others and Responsibility for all your actions.
Math 4030 – 6a Joint Distributions (Discrete)
Lecture 2: Statistical learning primer for biologists
1 Always be mindful of the kindness and not the faults of others.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
VARYING DEVIATION BETWEEN H 0 AND TRUE  SEQUENCE OF GRAPHS TO ILLUSTRATE POWER VARYING DEVIATION BETWEEN H 0 AND TRUE  N. Scott Urquhart Director, STARMAP.
Using Regional Models to Assess the Relative Effects of Stressors Lester L. Yuan National Center for Environmental Assessment U.S. Environmental Protection.
Log-linear Models Please read Chapter Two. We are interested in relationships between variables White VictimBlack Victim White Prisoner151 (151/160=0.94)
Logistic Regression Binary response variable Y (1 – Success, 0 – Failure) Continuous, Categorical independent Variables –Similar to Multiple Regression.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Biointelligence Laboratory, Seoul National University
Estimation and Model Selection for Geostatistical Models
Estimating mean abundance from repeated presence-absence surveys
TROUBLESOME CONCEPTS IN STATISTICS: r2 AND POWER
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado State University Developed under the EPA STAR Research Assistance Agreement CR

Motivating Problem Various stream sites in the Mid-Atlantic region of the United States were visited in Summer –For each site, each observed fish species was cross- categorized according to several traits –Environmental variables are also measured at each site (e.g. precipitation, chloride concentration, …) Relative proportions are more informative (species composition). How can we examine complex relationships between and within the covariates and response traits ?

Graphical Models (Chain Graphs) Graphical models (e.g. log-linear models, lattice spatial models) have been explored for examination of conditional dependencies within multivariate random variables Model: f (y 1, y 2 | x 1, x 2 )  f (x 1 )  f (x 2 ) X1X1 X2X2 Y2Y2 Y1Y1

Probability Model for Individuals Response variables –Set  of discrete categorical variables –Notation: y is a specific cell Explanatory variables –Set  of explanatory variables (covariates) –Notation: x refers to a specific explanatory observation Random effects –Allows flexibility when sampling many “sites” –Unobserved covariates –Notation:  f, f   refers to a random effect.

Probability Model and Extended Chain Graph, G  Joint distribution f (y, x,  ) = f (y|x,  )  f (x)  f (  ) Graph illustrating possible dependence relationships for the full model, G . X1X1 X2X2 Y2Y2 Y1Y1  {1,2} 22 11

Graphical Models for Discrete Compositions Sampling many individuals at a site results in cell counts, C(y) i = # individuals in cell y at site i = 1,…,S. Conditional count likelihood [C(y) i ] y | x i ~ multinomial (N i ; [f(y|x i,  i )] y ), Joint covariate count likelihood multinomial (N i ; [f(y|x i,  i )] y )  MVN ( ,  -1 ) Parameter estimation: Gibbs sampler with hierarchical centering. –Easier to impliment –Improved convergence

Fish Species Richness in the Mid-Atlantic Highlands 91 stream sites in the Mid Atlantic region of the United States were visited in an EPA EMAP study Response composition: Observed fish species were cross-categorized according to 2 discrete variables: 1.Habit Column species Benthic species 2.Pollution tolerance Intolerant Intermediate Tolerant

Fish Species Richness in the Mid-Atlantic Highlands Environmental covariates values were measured at each site for the following covariates 1.Mean watershed precipitation (m) 2.Minimum watershed elevation (m) 3.Turbidity (ln NTU) 4.Chloride concentration (ln  eq/L) 5.Sulfate concentration (ln  eq/L) 6.Watershed area (ln km 2 )

Fish Species Richness Model Composition Graphical Model and Prior distributions

Fish Species Functional Groups Edge exclusion determined from 95% HPD intervals for  parameters and off-diagonal elements of  Posterior suggested chain graph for independence model (lower DIC than dependent response model) Tolerance Precipitation Chloride Elevation Turbidity Area Sulfate Habit

Comments and Conclusions Using the proposed state-space model for discrete compositional data, –Relationships evaluated as a Markov random field –Multi-way compositions can be analyzed with specified dependence structure between cells –MVN random effects imply that the cell probabilities have a constrained LN distribution DR models also extend the capabilities of graphical models. –Data can be analyzed from many multiple sites. –Over dispersion in cell counts can be added.

The work reported here was developed under the STAR Research Assistance Agreement CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This presentation has not been formally reviewed by EPA. The views expressed here are solely those of presenter and the STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this presentation. # CR