Model- vs. design-based sampling and variance estimation on continuous domains Cynthia Cooper OSU Statistics September 11, 2004 R82-9096-01.

Slides:



Advertisements
Similar presentations
COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
Advertisements

Sampling Design, Spatial Allocation, and Proposed Analyses Don Stevens Department of Statistics Oregon State University.
Spatial point patterns and Geostatistics an introduction
Spatial point patterns and Geostatistics an introduction
Chapter 3 Properties of Random Variables
Use of Estimating Equations and Quadratic Inference Functions in Complex Surveys Leigh Ann Harrod and Virginia Lesser Department of Statistics Oregon State.
Comparison of Design-Based and Model-Based Techniques for Selecting Spatially Balanced Samples of Environmental Resources. Don L. Stevens, Jr. Department.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
Copyright © 2006 Pearson Addison-Wesley. All rights reserved. Lecture 4: Mathematical Tools for Econometrics Statistical Appendix (Chapter 3.1–3.2)
Chapter 10 Simple Regression.
Applied Geostatistics
What is a Multi-Scale Analysis? Implications for Modeling Presence/Absence of Bird Species Kathryn M. Georgitis 1, Alix I. Gitelman 1, Don L. Stevens 1,
Strength of Spatial Correlation and Spatial Designs: Effects on Covariance Estimation Kathryn M. Irvine Oregon State University Alix I. Gitelman Sandra.
Prediction and model selection
Statistical Background
Two-Phase Sampling Approach for Augmenting Fixed Grid Designs to Improve Local Estimation for Mapping Aquatic Resources Kerry J. Ritter Molly Leecaster.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Deterministic Solutions Geostatistical Solutions
Chapter 11 Multiple Regression.
1 Introduction to Biostatistics (PUBHLTH 540) Sampling.
SA basics Lack of independence for nearby obs
October, A Comparison of Variance Estimates of Stream Network Resources Sarah J. Williams Candidate for the degree of Master of Science Colorado.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
STAT 4060 Design and Analysis of Surveys Exam: 60% Mid Test: 20% Mini Project: 10% Continuous assessment: 10%
Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
Lecture II-2: Probability Review
Separate multivariate observations
1 Chapter 12 Introduction to Statistics A random variable is one in which the exact behavior cannot be predicted, but which may be described in terms of.
1 Spatial and Spatio-temporal modeling of the abundance of spawning coho salmon on the Oregon coast R Ruben Smith Don L. Stevens Jr. September.
Regression and Correlation Methods Judy Zhong Ph.D.
Sampling Design  M. Burgman & J. Carey Types of Samples Point samples (including neighbour distance samples) Transects line intercept sampling.
Geo479/579: Geostatistics Ch13. Block Kriging. Block Estimate  Requirements An estimate of the average value of a variable within a prescribed local.
Comparison of Variance Estimators for Two-dimensional, Spatially-structured Sample Designs. Don L. Stevens, Jr. Susan F. Hornsby* Department of Statistics.
Lecture 8: Generalized Linear Models for Longitudinal Data.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (1)
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
1 G Lect 8b G Lecture 8b Correlation: quantifying linear association between random variables Example: Okazaki’s inferences from a survey.
Geographic Information Science
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
It’s About Time Mark Otto U. S. Fish and Wildlife Service.
Lecture 2 Forestry 3218 Lecture 2 Statistical Methods Avery and Burkhart, Chapter 2 Forest Mensuration II Avery and Burkhart, Chapter 2.
Evaluating generalised calibration / Fay-Herriot model in CAPEX Tracy Jones, Angharad Walters, Ria Sanderson and Salah Merad (Office for National Statistics)
Spatial Analysis & Geostatistics Methods of Interpolation Linear interpolation using an equation to compute z at any point on a triangle.
Chapter 2 Statistical Background. 2.3 Random Variables and Probability Distributions A variable X is said to be a random variable (rv) if for every real.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
BASIC STATISTICAL CONCEPTS Statistical Moments & Probability Density Functions Ocean is not “stationary” “Stationary” - statistical properties remain constant.
Education 793 Class Notes Inference and Hypothesis Testing Using the Normal Distribution 8 October 2003.
From the population to the sample The sampling distribution FETP India.
Stochastic Hydrology Random Field Simulation Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Geostatistics GLY 560: GIS for Earth Scientists. 2/22/2016UB Geology GLY560: GIS Introduction Premise: One cannot obtain error-free estimates of unknowns.
Spatial Point Processes Eric Feigelson Institut d’Astrophysique April 2014.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Ka-fu Wong © 2007 ECON1003: Analysis of Economic Data Lesson0-1 Supplement 2: Comparing the two estimators of population variance by simulations.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
ESTIMATING RATIOS OF MEANS IN SURVEY SAMPLING Olivia Smith March 3, 2016.
Marginal Distribution Conditional Distribution. Side by Side Bar Graph Segmented Bar Graph Dotplot Stemplot Histogram.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Last lecture summary Five numbers summary, percentiles, mean Box plot, modified box plot Robust statistic – mean, median, trimmed mean outlier Measures.
CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.
Université d’Ottawa / University of Ottawa 2003 Bio 8102A Applied Multivariate Biostatistics L4.1 Lecture 4: Multivariate distance measures l The concept.
Multiple Random Variables and Joint Distributions
Estimation and Model Selection for Geostatistical Models
Inference for Geostatistical Data: Kriging for Spatial Interpolation
Estimation of Sampling Errors, CV, Confidence Intervals
Stochastic Hydrology Random Field Simulation
Sample vs Population (true mean) (sample mean) (sample variance)
Presentation transcript:

Model- vs. design-based sampling and variance estimation on continuous domains Cynthia Cooper OSU Statistics September 11, 2004 R

2 Introduction Research on model- and design-based sampling and estimation on continuous domains Compare... Basis of inference of each Sampling concepts Interpretation of variance Variance estimation

3 Duality in Environmental Monitoring Design-based Estimates –Status and trend –No model of underlying stochastic process Defensible –Probability sample Avoid selection bias Control sample process variance Model-based predictions –Stochastic behavior of response –Forecasting/prediction conditional on the observed data

4 General Outline Introduction Summary comparison of approaches Summary characterization of variance estimators Proposed model-assisted variance estimator Simulation methods Design-based context results Model-based (kriging) results Conclusion

5 Probability samples – unbiased estimates Basis for long-run frequency properties –Design-induced randomness – sample process variance Basic linear estimator scales up sample responses to extrapolate to population –Inclusion probabilities Examples –EPA EMAP –ODFW Monitoring Plan Augmented Rotating Panel –USFS Forest Inventory and Analysis Comparison of approaches - Design-based

6 Inclusion probability –Element-wise – Sum of probabilities of all samples which include the i th element  i –Pair-wise -- Sum of … which include i th & j th elements  ij For continuous domains –Inclusion probability densities (IPD) (Cordy (1993))

7 Response generated by a stochastic process Likelihood-based approaches to estimating parameters of model BLUP –Conditional on values observed in sample Examples –Mining surveys –Soil and hydrology surveys Comparison of approaches - Model-based

8 Variance estimators - Design-based Quantifies variability induced by sampling process Variance of linear estimators –Scale up square and cross-product terms with inverse marginal and pair-wise inclusion probability densities (IPDs) For continuous domains –Congruent tessellation stratified samples w/ one observation per stratum Require randomized grid origin to achieve non-zero cross-product terms (π ij -π i π j ) (Stevens (1997))

9 Variance estimators - Design-based Horvitz-Thompson (HT) Can be negative –Especially samples with a point pair in close proximity Requires randomly-located tessellation grid

10 Variance estimators - Design-based Yates-Grundy (YG) Assumes fixed effective sample size Point pairs with close proximity can destabilize (Stevens (2003)) Requires randomly-located tessellation grid

11 Variance estimators - Model-based Estimating MSPE of BLUP –Involves variances and covariances associated with square and cross-product terms of error Assume form of covariance that describes rate of decay of covariance Exponential Spherical Must result in positive-definite covariance matrix Incremental stationarity –E[(z(s i ) -z(s o )) 2 ] = g(||s i -s o ||) = g(h) –Typically, h   E[…] 

12 Variance estimators - Model-based Variance –Quantifies stochastic variability of expected value of response –Vanishes as ||s i -s o || → 0 Mean-square prediction error (MSPE) –a.k.a. MSE –Variance + bias 2 Sample process variability of BLUP –Weighted averages vary less –Varies more as sample range increases relative to resolution

13 Proposed model-assisted variance (V MA ) Predict variance within a stratum Variance is reduced by mean covariance (assuming positively correlated elements) –Similar to error variance computations (Ripley (1981)) Within-stratum estimated as –Sill reduced by within-stratum average covariance Linear estimator variance estimated as sum of squared coefficients times within-stratum variance Use covariance structure of response to model variability due to sampling process

14 Precursors of and precedence for modeling covariance Cochran (1946) –Finite population –Serial correlation w/ discrete lags Bellhouse (1977) –Continued extension of Cochran’s work to finite populations ordered on two dimensions Small-area estimation model-assisted approaches –J.N.K Rao (2003)

15 Random field (background) generated in R M. Schlather's GaussRF() of R package RandomFields Exponential covariance structure b*exp(-h/r) –(e.g. 4*exp(-h/2)) h is distance; b and r are "sill" and "range" parameters Methods – part 1

16 Methods – part 1a Repeat 1000 times per realization Stratified sample –n=100; one observation per stratum; stratum size 2x2 –Simple square-grid tessellation Randomized origin Constant origin REML estimate of covariance parameters (b,r)

17 Methods – part 2 Repeat 1000 times per realization (continued) For the design-based context –Estimate total (z hat ) HT estimator for continuous domain –Compute V HT, V YG and V MA –Compare estimated variances with empirical variance (V[z hat ]) For the model-based context example (Kriging) –Randomly selected z o at fixed location over 1000 trials –Obtain z hat, V OK, V MA

18

19 Results – Design-based application Empirical median relative error Compares estimated variances with empirical variance of estimate of total (V[z hat ]) (Stratified sample with randomized origin)

20 Results – Design-based application Exponential covariance with range= 2 and sill= Model-assisted Variance Observed V[zhat] Yates-Grundy Variance Observed V[zhat] Horvitz-Thompson Variance Observed V[zhat] Avg Med Avg Med Avg Med

21 Results – Design-based application Ratios of empirical standard deviations (Stratified sample with randomized origin)

22 Results – Model-based application Kriging variance (MSPE) Observed V[zhat] Avg Model-assisted variance Avg Exponential covariance with range= 1 and sill= 1 (stratified sample with randomized origin) Observed V[zhat]

23 Concluding - Model-assisted approach Small-area precedence Application to systematic and one-observation- per-stratum samples Effective alternative to direct estimators of continuous-domain randomized-origin tessellation stratified samples –Empirical results – less bias, better efficiency Doesn’t require randomly-located tessellation grid on continuous domain for non-zero π ij

24 Acknowledgements Thanks to Don Stevens Committee members OSU Statistics Faculty UW QERM Faculty

25 The research described in this presentation has been funded by the U.S. Environmental Protection Agency through the STAR Cooperative Agreement CR National Research Program on Design-Based/Model-Assisted Survey Methodology for Aquatic Resources at Oregon State University. It has not been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred R