1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
2 This presentation was supported under STAR Research Assistance Agreement No. CR awarded by the U.S. Environmental Protection Agency to Oregon State University. It has not been formally reviewed by EPA. The views expressed in this presentation are solely those of authors and EPA does not endorse any products or commercial services mentioned in this presentation.
3 Outline Missing data in environmental surveys Nonignorable missing data mechanism Model-based approach for nonignorable missing data Design-based estimation and nonignorable missing data Illustration Summary
4 Missing Data in Environmental Surveys Researchers in environmental studies must obtain access to selected sites to gather field data Denial of access: common problem in environmental surveys unit non-response affects the results of data analysis
5 Response Disposition 1995/1996 EMAP North Dakota Prairie Wetlands Studies (Lesser, 2001) Result Private Landowners Agreed to access43%40% Refused access36%37% Undeliverable 2% Not returned/no contact16%14% Public Land 3% 7% Total 100%
6 Introduction (Boward et.al.,1999) The Maryland Biological Stream Survey Results: overall denial access rate of 10%. ODFW habitat surveys overall rate of access denial (Flitcroft et.al., 2002): 1998: 10.0% 1999: 6.0% 2000: 12.5%
7 Assumptions A probability sampling design to collect outcomes of a spatial random process Y is a collection of sampling sites selected using the probability sampling design. auxiliary variables
8 Smith, Skinner and Clark (1999), Rubin and Little (2002) X1X1 X2X2 YR Missing Mechanism: Missing Completely at Random (MCAR)
9 X1X1 X2X2 YR Missing Mechanism: Missing at Random (MAR) Smith, Skinner and Clark (1999), Rubin and Little (2002)
10 X1X1 X2X2 YR Missing Mechanism: Nonignorable Smith, Skinner and Clark (1999), Rubin and Little (2002)
11 Model-based Approach Under a nonignorable mechanism: we model the joint probability of the data and the missing mechanism indicator (“response” indicator) : R(s i ) ~ Bernoulli(p i ), Data model Missing Mechanism model covariates
12 Model-assisted estimation and nonignorable missing data Assume the parameter of interest: Total of the response Y R
13 Model-assisted estimation and nonignorable missing data Continuous form of the Horvitz-Thompson estimator for the total (Cordy, 1993): Let be a collection of fixed values
14 Model-assisted estimation (cont.) Sample size n: observed, n-n* missing nonignorable missing
15 Model-assisted estimation (cont.) denotes the
16 Model-assisted estimation (cont.) Likelihood:
17 Model-assisted estimation (cont.) Reparameterize model parameters ( Baker and Laird (1988 )): Expected cell counts
18 Model-assisted estimation (cont.) Use EM algorithm to estimate expected counts of missing cells, M ij. E-step:
19 M-step: iterative proportional fitting (IPF) (Bishop et.al., 1975) Algorithm based on fit of marginal totals. EM algorithm always converges to a solution when using IPF in the M-step (Baker and Laird, 1988) Model-assisted estimation (cont.)
20 Possible estimators for the total of Y: Cell adjustment: Model-assisted estimation (cont.) adjustment weight (Little and Rubin, 2002)
21 Column adjustment: Model-assisted estimation (cont.)
22 Row adjustment: Model-assisted estimation (cont.)
23 Model-assisted estimation (cont.) Variance estimators obtained using bootstrap (Efron, 1994) Bootstrap produces asymptotically valid variance.
24 Illustration We simulate a continuous multivariate normal spatial random process for y Population: John Day Middle Fork stream reaches 143 stream reaches divided in survey segments (~1 mile) 6536 survey segments Area of 785 mi 2
25 Illustration The population of stream reaches was stratified in 6 strata based on the number of survey segments: “<10 ” “10-20” “20-30” “30-50” “50-100” “>100” Nonignorable missing data was generated as: Missing rates of 15%, 30% and 50% were created.
26
27 Population Summary Strata1Strata2Strata3Strata4Strata5Strata6 Size Class Class 1 Class % 35.77% 65.13% 34.87% 64.31% 35.69% 65.44% 34.56% 65.48% 34.52% 61.70% 38.30% Summary Minimum Mean Max
28 Illustration Sample size n = 100 Allocation proportional to number of survey segments on each strata Q 1 = first sample quantile
29
30 Modified Bootstrap We draw 1000 random samples of size 100 from the observed sample: Independently across strata Maintain proportional allocation Maintain the row totals by the auxiliary variable For each of the 1000 samples, we estimate We obtain a standard error and MSE for each estimate We repeat this process 1000 times
31 Summary