Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,

Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources, a two-stage sampling design can be employed to make the best use of what are often limited time and financial resources. Even with the ability to focus such resources, it is often the case that the sample sizes are not sufficiently large to make model-free inferences. The presence of auxiliary information for the regions of interest suggests employing a model in our inferences. Breidt, Claeskens, and Opsomer (2003) propose incorporating this auxiliary information through a class of model-assisted estimators based on penalized spline regression in single stage sampling. Zheng and Little (2003) also use penalized spline regression in a model-based approach for finite population estimation in a two-stage sample. In a survey context, weights computed from a set of auxiliary information are often applied to many study variables. With this approach, model-assisted estimators should fare better than model-based estimators. We compare the two through a series of simulations. This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative Agreements # CR – 829095 and # CR – 829096 Funding/Disclaimer The work reported here was developed under the STAR Research Assistance Agreement CR-829095 and CR-829096 awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University. This poster has not been formally reviewed by EPA. The views expressed here are solely those of the presenter and the STARMAP, the Program he represents. EPA does not endorse any products or commercial services mentioned in this poster. Case A: Cluster Level Auxiliaries (Our focus) The auxiliary information is available for all clusters in the population Leads to regression modeling of quantities associated with the clusters, such as cluster totals Cluster quantities can be computed for all clusters Population quantities can be computed from cluster estimates Example: Lake represents a cluster; auxiliary information is elevation Case B: Complete Element Level Auxiliaries The auxiliary information is available for all elements in the population Leads to regression modeling of quantities associated with the elements Cluster and population quantities can then be computed from element estimates and observations Example: EMAP hexagon is cluster; lake is element; auxiliary information is elevation Case C: Limited Element Level Auxiliaries The auxiliary information is available for all elements in selected clusters only Leads to regression modeling of quantities associated with the elements Regression estimators can be used for cluster-level quantities only for the clusters selected in the first-stage sample Example: Aerial photography of selected sites (clusters); for each point (element) in site, we have percent forested, urban, industrial Case D: Limited Cluster Level Auxiliaries The auxiliary information is available for all clusters in the first-stage sample Not a very interesting case Design-based estimator can be used for population quantities In some cases, good estimators for population quantities are not available Example: Cluster is lake; auxiliary information is measure of size which is not available until site is visited Generating Responses 500 PSUs; the number of SSUs per cluster ~ Uniform(50, 400)  PSU = m(  I ) + , where m(  ) is one of the eight functions below and  ~ N(0,  2 I) – We use first order inclusion probabilities proportional to size (pps) – Auxiliary data is often proportional to size of cluster Response of interest y ij =  i +  ij. where y ij is the jth element in the ith cluster and  ij ~iid N(0,  2 ) Two-Stage Sampling The population of elements U = {1,…, k,…, N} is partitioned into clusters or primary sampling units (PSUs), U 1,…, U i,…,. So, where N i is the number of elements or secondary sampling units (SSUs) in U i. First stage: A sample of clusters, s I, is selected based on a design, p I (  ) with inclusion probabilities  Ii and  Iij. –  Ii and  Iij are the first and second order inclusion probabilities, respectively Second stage: For every i  s I, a sample s i is drawn from U i based on the design p i (  | s I ) Typically require second stage design to be invariant and independent of the first stage Two-Stage Sampling with Aquatic Resources Time and expense constraints may make two-stage sampling more efficient Auxiliary information may be available on different scales The Estimators (for population totals) Horvitz-Thompson (HT) where Model-assisted where is the PSU total predicted by the model Model-based where is the ith cluster mean predicted by the model Comments on Simulation Results 500 samples from each of the populations were drawn H-T = Horvitz-Thompson estimator M-A: lin = Model-assisted estimator using a linear model M-B: pmmra = Model-based estimator using a penalized spline and including a random effect for PSU M-A: pmm = Model-assisted estimator using a penalized spline with no random effect for PSU Point represents MSE Estimator :MSE Model-assisted estimator with radom effect for PSU Vertical black bars represent approximate 95% confidence intervals Model-assisted estimator with random effect for PSU is as efficient or more efficient than model-based estimator; we do not appear to lose efficiency (with respect to MSE) by using model-assisted non-parametric methods Notes on the Models and Model Parameters 3 different models used – Linear – Penalized spline with random effect for PSU – Penalized spline with no random effect for PSU In a survey context, such as those found in environmental monitoring, it is often desirable to obtain a single set of survey weights that can be used to predict any study variable. To accommodate this: – Smoothing parameter for spline is selected by fixing the degrees of freedom for the smooth rather than using a data driven approach – Variance component for PSU effect is computed for the linear model and resulting covariance matrix and corresponding survey weights are applied to samples from other data sets – In this kind of survey context, model-assisted estimators have good efficiency properties and should be superior to model-based estimators which rely on correct specification of variance components

Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,

Similar presentations

Presentation on theme: "Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,

Similar presentations

Presentation on theme: "Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,"— Presentation transcript:

Similar presentations

About project

Feedback