1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R82-9096-01.

Slides:



Advertisements
Similar presentations
Use of Estimating Equations and Quadratic Inference Functions in Complex Surveys Leigh Ann Harrod and Virginia Lesser Department of Statistics Oregon State.
Advertisements

Nonparametric estimation of non- response distribution in the Israeli Social Survey Yury Gubman Dmitri Romanov JSM 2009 Washington DC 4/8/2009.
Image Modeling & Segmentation
Treatment of missing values
VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.
Model- vs. design-based sampling and variance estimation on continuous domains Cynthia Cooper OSU Statistics September 11, 2004 R
Segmentation and Fitting Using Probabilistic Methods
Robust sampling of natural resources using a GIS implementation of GRTS David Theobald Natural Resource Ecology Lab Dept of Recreation & Tourism Colorado.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
State-Space Models for Within-Stream Network Dependence William Coar Department of Statistics Colorado State University Joint work with F. Jay Breidt This.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Semiparametric Mixed Models in Small Area Estimation Mark Delorey F. Jay Breidt Colorado State University September 22, 2002.
The EM algorithm (Part 1) LING 572 Fei Xia 02/23/06.
Bayesian modeling for ordinal substrate size using EPA stream data Megan Dailey Higgs Jennifer Hoeting Brian Bledsoe* Department of Statistics, Colorado.
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
1 Unsupervised Learning With Non-ignorable Missing Data Machine Learning Group Talk University of Toronto Monday Oct 4, 2004 Ben Marlin Sam Roweis Rich.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
Chapter 17 Additional Topics in Sampling
Clustered or Multilevel Data
1 Accounting for Spatial Dependence in Bayesian Belief Networks Alix I Gitelman Statistics Department Oregon State University August 2003 JSM, San Francisco.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to Improve Local Estimation for Mapping Aquatic Resources Kerry J. Ritter Molly Leecaster.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.
Habitat association models  Independent Multinomial Selections (IMS): (McCracken, Manly, & Vander Heyden, 1998) Product multinomial likelihood with multinomial.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Models for the Analysis of Discrete Compositional Data An Application of Random Effects Graphical Models Devin S. Johnson STARMAP Department of Statistics.
State-Space Models for Biological Monitoring Data Devin S. Johnson University of Alaska Fairbanks and Jennifer A. Hoeting Colorado State University.
How to deal with missing data: INTRODUCTION
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Optimal Sample Designs for Mapping EMAP Data Molly Leecaster, Ph.D. Idaho National Engineering & Environmental Laboratory Jennifer Hoeting, Ph. D. Colorado.
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
Analysis of Covariance Goals: 1)Reduce error variance. 2)Remove sources of bias from experiment. 3)Obtain adjusted estimates of population means.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Random Effects Graphical Models and the Analysis of Compositional Data Devin S. Johnson and Jennifer A. Hoeting STARMAP Department of Statistics Colorado.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Review of normal distribution. Exercise Solution.
1 Spatial and Spatio-temporal modeling of the abundance of spawning coho salmon on the Oregon coast R Ruben Smith Don L. Stevens Jr. September.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
HMM - Part 2 The EM algorithm Continuous density HMM.
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
1 CSE 552/652 Hidden Markov Models for Speech Recognition Spring, 2006 Oregon Health & Science University OGI School of Science & Engineering John-Paul.
Flat clustering approaches
Expectation-Maximization (EM) Algorithm & Monte Carlo Sampling for Inference and Approximation.
Survey and Sampling Methods Session 9. Introduction Nonprobability Sampling and Bias Stratified Random Sampling Cluster Sampling Systematic Sampling Nonresponse.
Tutorial I: Missing Value Analysis
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Joint Modelling of Accelerated Failure Time and Longitudinal Data By By Yi-Kuan Tseng Yi-Kuan Tseng Joint Work With Joint Work With Professor Jane-Ling.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Introduction to Survey Data Analysis
Multiple Imputation.
Spatial Prediction of Coho Salmon Counts on Stream Networks
Generalized Spatial Dirichlet Process Models
The European Statistical Training Programme (ESTP)
EM for Inference in MV Data
EM for Inference in MV Data
Longitudinal Data & Mixed Effects Models
Chapter 13: Item nonresponse
Presentation transcript:

1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R

2 This presentation was supported under STAR Research Assistance Agreement No. CR awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this presentation are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.

3 Outline  Missing data in environmental surveys  Nonignorable missing data mechanism  Model-based approach for nonignorable missing data  Design-based estimation and nonignorable missing data  Illustration  Summary

4 Missing Data in Environmental Surveys  Researchers in environmental studies must obtain access to selected sites to gather field data  Denial of access:  common problem in environmental surveys  unit non-response  affects the results of data analysis

5 Response Disposition 1995/1996 EMAP North Dakota Prairie Wetlands Studies (Lesser, 2001) Result Private Landowners Agreed to access43%40% Refused access36%37% Undeliverable 2% Not returned/no contact16%14% Public Land 3% 7% Total 100%

6 Introduction  (Boward et.al.,1999) The Maryland Biological Stream Survey Results: overall denial access rate of 10%.  ODFW habitat surveys overall rate of access denial (Flitcroft et.al., 2002):  1998: 10.0%  1999: 6.0%  2000: 12.5%

7 Assumptions  A probability sampling design to collect outcomes of a spatial random process Y  is a collection of sampling sites selected using the probability sampling design.  auxiliary variables

8 Smith, Skinner and Clark (1999), Rubin and Little (2002) X1X1 X2X2 YR Missing Mechanism: Missing Completely at Random (MCAR)

9 X1X1 X2X2 YR Missing Mechanism: Missing at Random (MAR) Smith, Skinner and Clark (1999), Rubin and Little (2002)

10 X1X1 X2X2 YR Missing Mechanism: Nonignorable Smith, Skinner and Clark (1999), Rubin and Little (2002)

11 Model-based Approach  Under a nonignorable mechanism: we model the joint probability of the data and the missing mechanism indicator (“response” indicator) :  R(s i ) ~ Bernoulli(p i ), Data model Missing Mechanism model covariates

12 Model-assisted estimation and nonignorable missing data  Assume the parameter of interest: Total of the response Y R

13 Model-assisted estimation and nonignorable missing data  Continuous form of the Horvitz-Thompson estimator for the total (Cordy, 1993):  Let be a collection of fixed values

14 Model-assisted estimation (cont.)  Sample size n: observed, n-n* missing nonignorable missing

15 Model-assisted estimation (cont.) denotes the

16 Model-assisted estimation (cont.)  Likelihood:

17 Model-assisted estimation (cont.)  Reparameterize model parameters ( Baker and Laird (1988 )): Expected cell counts

18 Model-assisted estimation (cont.)  Use EM algorithm to estimate expected counts of missing cells, M ij.  E-step:

19  M-step: iterative proportional fitting (IPF) (Bishop et.al., 1975)  Algorithm based on fit of marginal totals.  EM algorithm always converges to a solution when using IPF in the M-step (Baker and Laird, 1988) Model-assisted estimation (cont.)

20  Possible estimators for the total of Y:  Cell adjustment: Model-assisted estimation (cont.) adjustment weight (Little and Rubin, 2002)

21  Column adjustment: Model-assisted estimation (cont.)

22  Row adjustment: Model-assisted estimation (cont.)

23 Model-assisted estimation (cont.)  Variance estimators obtained using bootstrap  (Efron, 1994) Bootstrap produces asymptotically valid variance.

24 Illustration  We simulate a continuous multivariate normal spatial random process for y  Population: John Day Middle Fork stream reaches  143 stream reaches divided in survey segments (~1 mile)  6536 survey segments  Area of 785 mi 2

25 Illustration  The population of stream reaches was stratified in 6 strata based on the number of survey segments: “<10 ” “10-20” “20-30” “30-50” “50-100” “>100”  Nonignorable missing data was generated as:  Missing rates of 15%, 30% and 50% were created.

26

27 Population Summary Strata1Strata2Strata3Strata4Strata5Strata6 Size Class Class 1 Class % 35.77% 65.13% 34.87% 64.31% 35.69% 65.44% 34.56% 65.48% 34.52% 61.70% 38.30% Summary Minimum Mean Max

28 Illustration  Sample size n = 100  Allocation proportional to number of survey segments on each strata  Q 1 = first sample quantile

29

30 Modified Bootstrap  We draw 1000 random samples of size 100 from the observed sample:  Independently across strata  Maintain proportional allocation  Maintain the row totals by the auxiliary variable  For each of the 1000 samples, we estimate  We obtain a standard error and MSE for each estimate  We repeat this process 1000 times

31 Summary