Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer.

Slides:



Advertisements
Similar presentations
Use of Estimating Equations and Quadratic Inference Functions in Complex Surveys Leigh Ann Harrod and Virginia Lesser Department of Statistics Oregon State.
Advertisements

Lesson 10: Linear Regression and Correlation
Kin 304 Regression Linear Regression Least Sum of Squares
VARYING RESIDUAL VARIABILITY SEQUENCE OF GRAPHS TO ILLUSTRATE r 2 VARYING RESIDUAL VARIABILITY N. Scott Urquhart Director, STARMAP Department of Statistics.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Correlation and regression
Model- vs. design-based sampling and variance estimation on continuous domains Cynthia Cooper OSU Statistics September 11, 2004 R
Objectives (BPS chapter 24)
Simple Linear Regression
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey, F. Jay Breidt, Colorado State University Abstract In aquatic resources,
Chapter 13 Conducting & Reading Research Baumgartner et al Data Analysis.
Semiparametric Mixed Models in Small Area Estimation Mark Delorey F. Jay Breidt Colorado State University September 22, 2002.
Regression and Correlation
Applied Geostatistics
Strength of Spatial Correlation and Spatial Designs: Effects on Covariance Estimation Kathryn M. Irvine Oregon State University Alix I. Gitelman Sandra.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Nonparametric, Model-Assisted Estimation for a Two-Stage Sampling Design Mark Delorey Joint work with F. Jay Breidt and Jean Opsomer September 8, 2005.
Deterministic Solutions Geostatistical Solutions
Example For simplicity, assume Z i |F i are independent. Let the relative frame size of the incomplete frame as well as the expected cost vary. Relative.
Simple Linear Regression Analysis
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
Distribution Function Estimation in Small Areas for Aquatic Resources Spatial Ensemble Estimates of Temporal Trends in Acid Neutralizing Capacity Mark.
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
Simple Linear Regression Analysis
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
1 MULTI VARIATE VARIABLE n-th OBJECT m-th VARIABLE.
Chapter 11 Simple Regression
Quantitative Skills: Data Analysis and Graphing.
Comparison of Variance Estimators for Two-dimensional, Spatially-structured Sample Designs. Don L. Stevens, Jr. Susan F. Hornsby* Department of Statistics.
September In Chapter 14: 14.1 Data 14.2 Scatterplots 14.3 Correlation 14.4 Regression.
6.1 What is Statistics? Definition: Statistics – science of collecting, analyzing, and interpreting data in such a way that the conclusions can be objectively.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Quantitative Skills 1: Graphing
1 G Lect 10a G Lecture 10a Revisited Example: Okazaki’s inferences from a survey Inferences on correlation Correlation: Power and effect.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
MULTIPLE TRIANGLE MODELLING ( or MPTF ) APPLICATIONS MULTIPLE LINES OF BUSINESS- DIVERSIFICATION? MULTIPLE SEGMENTS –MEDICAL VERSUS INDEMNITY –SAME LINE,
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 16 Data Analysis: Testing for Associations.
Examining Relationships in Quantitative Research
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Lesson 14 - R Chapter 14 Review. Objectives Summarize the chapter Define the vocabulary used Complete all objectives Successfully answer any of the review.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Quantitative Methods. Bivariate Regression (OLS) We’ll start with OLS regression. Stands for  Ordinary Least Squares Regression. Relatively basic multivariate.
VARYING DEVIATION BETWEEN H 0 AND TRUE  SEQUENCE OF GRAPHS TO ILLUSTRATE POWER VARYING DEVIATION BETWEEN H 0 AND TRUE  N. Scott Urquhart Director, STARMAP.
Using Regional Models to Assess the Relative Effects of Stressors Lester L. Yuan National Center for Environmental Assessment U.S. Environmental Protection.
WARM UP: Penny Sampling 1.) Take a look at the graphs that you made yesterday. What are some intuitive takeaways just from looking at the graphs?
Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.
Simple Linear Regression and Correlation (Continue..,) Reference: Chapter 17 of Statistics for Management and Economics, 7 th Edition, Gerald Keller. 1.
Stats Methods at IC Lecture 3: Regression.
Estimating standard error using bootstrap
REGRESSION G&W p
Review 1. Describing variables.
BPK 304W Correlation.
Summarising and presenting data - Univariate analysis continued
CHAPTER 29: Multiple Regression*
Interval Estimation and Hypothesis Testing
Simple Linear Regression
TROUBLESOME CONCEPTS IN STATISTICS: r2 AND POWER
Presentation transcript:

Applications of Nonparametric Survey Regression Estimation in Aquatic Resources F. Jay Breidt, Siobhan Everson-Stewart, Alicia Johnson, Jean D. Opsomer Nonparametric Model-Assisted Survey Regression Estimation F. Jay Breidt & Jean D. Opsomer Application to Northeastern Lakes Findings  For both CDF estimation and estimation of the median: Compared nonparametric regression estimator to Horvitz-Thompson and parametric estimators Nonparametric regression estimator performed well, in terms of mean square error, especially when the parametric model was misspecified Model-assisted approaches had lower relative bias than model-based approaches Extension to CDF Estimation Colorado State MS Project: Alicia Johnson Objectives Extend nonparametric regression estimation to finite population cumulative distribution function (CDF) estimation and compare to parametric techniques. Approach  Replaced response variable by indicator =1 for, 0 otherwise  Smoothed indicator versus auxiliary, x  Generated seven populations with various mean functions and variance terms  Performed simulation study to compare nonparametric regression CDF estimator to standard CDF estimators for estimation of CDF at median for estimation of median Model-Assisted Estimation Auxiliary Information  Use auxiliary information available for the entire aquatic resource of interest in addition to the sample data Example: spatial location of every lake in the population is known for EPA’s Environmental Monitoring and Assessment Program (EMAP) Northeastern Lakes study General Form of the Model-Assisted Estimator  Estimate population total as sum of model-based predictions for all population elements, plus a design-bias adjustment: Classical Parametric Survey Regression Estimator  Model-based predictions come from regressing the sample response on the auxiliary variable: A Nonparametric Approach Motivation for Nonparametric Methods  Regression estimator is inefficient if true relationship between the response and the auxiliary information is not linear  Breidt and Opsomer (2000) replaced parametric regression by nonparametric regression  Model-based predictions come from a local linear smooth (kernel regression) Local Linear Regression  Smooth at a point by performing locally weighted least squares regression  Weights come from kernel function, K Kernel may be a density or other function such as Epanechnikov, ¾(1-u 2 )I{|u| <1} Kernel scaled by bandwidth, h Large h leads to smoother, more global linear regression Small h leads to rougher, more local linear regression  Intercept in the locally weighted least squares fit is the smooth at the point  Modify for survey context by incorporating design weights.  Plug into model-assisted estimator Nonparametric Survey Regression Estimator  Nonparametric estimator of the total: where the nonparametric model-based prediction is with local design matrix, and the local weighting matrix, asymptotically design unbiased and consistent competitive with classical survey regression when the parametric model is correct dominates the classical estimator when the parametric model is misspecified admits a consistent variance estimator: For more information, see Breidt, F.J. and Opsomer, J.D. (2000). Local Polynomial Regression Estimation in Survey Sampling. Annals of Statistics 28, Objectives Extension to Spatial Sampling Colorado State MS Project: Siobhan Everson-Stewart Approach  Replaced univariate kernel regression with bivariate kernel regression  Used product Epanechnikov kernel  Performed a simulation study to compare nonparametric regression estimator to standard estimators  Created smooth, spatially correlated surface over the unit square; varied strength of correlation, planar trend, variation in surface, random noise, and sample size Findings  Compared performance of Horvitz- Thompson, regression, and kernel regression estimators  Parametric planar regression did well when surface contains planar portion  Local planar regression estimator performed well, especially when parametric model was misspecified Extend nonparametric regression estimation to spatial sampling and compare to parametric techniques.  Population and Study Design EMAP surveyed lakes in the northeastern United States from Aquatic resource of interest is over 20,000 lakes in 8 states 330 individual lakes were visited, each from one to six times Many measurements were taken on each lake, including several lake chemistry levels Acid neutralizing capacity (ANC) is a measure of a lake’s ability to buffer itself  Auxiliary Information For every lake in the region of interest, auxiliary information included spatial location, elevation, and ecoregion Use spatial location for illustration Easy to extend semiparametrically with parametric terms for elevation and ecoregion  CDF Estimation in Spatial Sampling Applied to Northeastern lakes data set Combined CDF estimation and spatial location extension Estimated CDF of ANC using local planar regression (LPR)  Confidence Interval Calculation Lakes are considered acidic if ANC < 0 Calculated 95% for the CDF at zero, which estimates proportion of acidic lakes in the region EPA’s National Surface Waters Survey estimated 4.2% of lakes in the northeastern region of the US to be acidic. 95% LPR Confidence Interval: (3.0%, 7.5%) contains the National Surface Waters Survey estimate Cumulative distribution function of ANC based on local planar regression (LPR) smooth on spatial location, with 95% pointwise confidence intervals. For comparison, design-based empirical CDF and confidence bounds are also shown. Map of lake population and lakes included in the EMAP Northeastern Lakes survey. Illustration of local linear regression. Curves at the bottom of the graph are kernel weights. The solid lines show the local weighted least squares fit at the points of interest. The dotted line is the kernel smooth. For more information, see Everson-Stewart (2003), Nonparametric survey regression estimation in two-stage spatial sampling, unpublished masters project, Colorado State University, available at For more information, see Johnson, A. (2003), Estimating Distribution Functions from Survey Data, unpublished masters project, Colorado State University, available at CI for Proportion of Acidic Lakes with National Surface Waters Survey Estimate Illustration of the model mean and standard deviation bounds (left) and the CDF (right) for one of seven generated populations. This research is funded by U.S.EPA – Science To Achieve Results (STAR) Program Cooperative Agreements # CR – and # CR – The research described in this poster has been funded by the U.S. Environmental Protection Agency through STAR Cooperative Agreements CR awarded by the U.S. Environmental Protection Agency (EPA) to Colorado State University and CR awarded to Oregon State University. The poster has not been subjected to the Agency's review and therefore does not necessarily reflect the views of the Agency, and no official endorsement should be inferred Relative biases and mean square error ratios (relative to model-assisted local linear, LLR) for DB (design-based Horvitz-Thompson), CD0 and CD1 (parametric model-based using ratio and regression models), RKM0 and RKM1 (parametric model-assisted using ratio and regression models), and LLRB (local linear model-based)