1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses.

Slides:



Advertisements
Similar presentations
Test of (µ 1 – µ 2 ),  1 =  2, Populations Normal Test Statistic and df = n 1 + n 2 – 2 2– )1– 2 ( 2 1 )1– 1 ( 2 where ] 2 – 1 [–
Advertisements

Hypothesis Testing Steps in Hypothesis Testing:
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Linear regression models
Correlation and regression
Objectives (BPS chapter 24)
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Ch 4: Stratified Random Sampling (STS)
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter 11 Multiple Regression.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
STAT262: Lecture 5 (Ratio estimation)
A new sampling method: stratified sampling
Inferences About Process Quality
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
5-3 Inference on the Means of Two Populations, Variances Unknown
Correlation and Regression Analysis
Chapter 12 Section 1 Inference for Linear Regression.
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
CORRELATION & REGRESSION
Correlation and Regression
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Regression Analysis © 2007 Prentice Hall17-1. © 2007 Prentice Hall17-2 Chapter Outline 1) Correlations 2) Bivariate Regression 3) Statistics Associated.
Lecture 10: Correlation and Regression Model.
Applied Quantitative Analysis and Practices LECTURE#25 By Dr. Osman Sadiq Paracha.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Example x y We wish to check for a non zero correlation.
Sampling Design and Analysis MTH 494 Lecture-21 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
BPS - 5th Ed. Chapter 231 Inference for Regression.
Marginal Distribution Conditional Distribution. Side by Side Bar Graph Segmented Bar Graph Dotplot Stemplot Histogram.
Variability. The differences between individuals in a population Measured by calculations such as Standard Error, Confidence Interval and Sampling Error.
Variability.
Chapter 4 Basic Estimation Techniques
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Regression Analysis: Statistical Inference
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Correlation and Simple Linear Regression
Ratio and regression estimation STAT262, Fall 2017
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Multiple Regression Models
Correlation and Simple Linear Regression
Basic Practice of Statistics - 3rd Edition Inference for Regression
SIMPLE LINEAR REGRESSION
CHAPTER 12 More About Regression
Simple Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Chapter 14 Inference for Regression
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

1 Ratio estimation under SRS Assume Absence of nonsampling error SRS of size n from a pop of size N Ratio estimation is alternative to under SRS, uses “auxiliary” information (X ) Sample data: observe y i and x i Population information Have y i and x i on all individual units, or Have summary statistics from the population distribution of X, such as population mean, total of X Ratio estimation is also used to estimate population parameter called a ratio (B )

2 Uses Estimate a ratio Tree volume or bushels per acre Per capita income Liability to asset ratio More precise estimator of population parameters If X and Y are correlated, can improve upon Estimating totals when pop size N is unknown Avoids need to know N in formula for Domain estimation Obtaining estimates of subsamples Incorporate known information into estimates Postratification Adjust for nonresponse

3 Estimating a ratio, B Population parameter for the ratio: B Examples Number of bushels harvested (y) per acre (x) Number of children (y) per single-parent household (x) Total usable weight (y) relative to total shipment weight (x) for chickens

4 Estimating a ratio SRS of n observation units Collect data on y and x for each OU Natural estimator for B ?

5 Estimating a ratio -2 Estimator for B is a biased estimator for B is a ratio of random variables

6 Bias of

7 Bias is small if Sample size n is large Sample fraction n/N is large is large is small (pop std deviation for x) High positive correlation between X and Y (see Lohr p. 67) Bias of – 2

8 Estimated variance of estimator for B Estimator for If is unknown?

9 Variance of Variance is small if sample size n is large sample fraction n/N is large deviations about line e = y  Bx are small correlation between X and Y close to  1 is large

10 Ag example – 1 Frame: 1987 Agricultural Census Take SRS of 300 counties from 3078 counties to estimate conditions in 1992 Collect data on y, have data on x for sample Existing knowledge about the population

11 Ag example – 2 Estimate farm acres in 1992 relative to 1987 farm acres

12 Ag example – 3 Need to calculate variance of e i ’s

13 Ag example – 4 For each county i, calculate Coffee Co, AL example Sum of squares for e i

14 Ag example – 5

15 Estimating proportions If denominator variable is random, use ratio estimator to estimate the proportion p Example (p. 72) 10 plots under protected oak trees used to assess effect of feral pigs on native vegetation on Santa Cruz Island, CA Count live seedlings y and total number of seedlings x per plot Y and X correlated due to common environmental factors Estimate proportion of live seedlings to total number of seedlings

16 Estimating population mean Estimator for “Adjustment factor” for sample mean A measure of discrepancy between sample and population information, and Improves precision if X and Y are + correlated

17 Underlying model with B > 0 B is a slope B > 0 indicates X and Y are positively correlated Absence of intercept implies line must go through origin (0, 0 ) y x 0 0

18 Using population mean of X to adjust sample mean Discrepancy between sample & pop info for X is viewed as evidence that same relative discrepancy exists between

19 Bias of Ratio estimator for the population mean is biased Rules of thumb for bias of apply

20 Estimator for variance of

21 Ag example – 6

22 Ag example - 8

23 Ag example – 9 Expect a linear relationship between X and Y (Figure 3.1) Note that sample mean is not equal to population mean for X

24 MSE under ratio estimation Recall … MSE = Variance + Bias 2 SRS estimators are unbiased so MSE = Variance Ratio estimators are biased so MSE > Variance Use MSE to compare design/estimation strategies EX: compare sample mean under SRS with ratio estimator for pop mean under SRS

25 Sample mean vs. ratio estimator of mean is smaller than if and only if For example, if and ratio estimation will be better than SRS

26 Estimating the MSE Estimate MSE with sample estimates of bias and variance of estimator This tends to underestimate MSE and are approximations Estimated MSE is less biased if is small (see earlier slide) Large sample size or sampling fraction High + correlation for X and Y is a precise estimate (small CV for ) We have a reasonably large sample size (n > 30)

27 Ag example – 10

28 Estimating population total t Estimator for t Is biased? Estimator for

29 Ag example – 11

30 Summary of ratio estimation

31 Summary of ratio estn – 2

32 Regression estimation What if relationship between y and x is linear, but does NOT pass through the origin Better model in this case is y x B0B0 B 1 slope

33 Regression estimation – 2 New estimator is a regression estimator To estimate, is predicted value from regression of y on x at Adjustment factor for sample mean is linear, rather than multiplicative

34 Estimating population mean Regression estimator Estimating regression parameters

35 Estimating pop mean – 2 Sample variances, correlation, covariance

36 Bias in regression estimator

37 Estimating variance Note: This is a different residual than ratio estimation (predicted values differ)

38 Estimating the MSE Plugging sample estimates into Lohr, equation 3.13:

39 Estimating population total t Is regression estimator for t unbiased?

40 Tree example Goal: obtain a precise estimate of number of dead trees in an area Sample Select n = 25 out of N = 100 plots Make field determination of number of dead trees per plot, y i Population For all N = 100 plots, have photo determination on number of dead trees per plot, x i Calculate = 11.3 dead trees per plot

41 Tree example – 2 Lohr, p Data Plot of y vs. x Output from PROC REG Components for calculating estimators and estimating the variance of the estimators We will use PROC SURVEYREG, which will give you the correct output for regression estimators

42 Tree example – 3 Estimated mean number of dead trees/plot Estimated total number of dead trees

43 Tree example – 4 Due to small sample size, Lohr uses t - distribution w/ n  2 degrees of freedom Half-width for 95% CI Approx 95% CI for t y is (1115, 1283) dead trees

44 Related estimators Ratio estimator B 0 = 0  ratio model Ratio estimator  regression estimator with no intercept Difference estimation B 1 = 1  slope is assumed to be 1 y x B0B0 B 1 slope

45 Domain estimation under SRS Usually interested in estimates and inferences for subpopulations, called domains If we have not used stratification to set the sample size for each domain, then we should use domain estimation We will assume SRS for this discussion If we use stratified sampling with strata = domains, then use stratum estimators (Ch 4) To use stratification, need to know domain assignment for each unit in the sampling frame prior to sampling

46 Stratification vs. domain estimation In stratified random sampling Define sample size in each stratum before collecting data Sample size in stratum h is fixed, or known In other words, the sample size n h is the same for each sample selected under the specified design In domain estimation n d = sample size in domain d is random Don’t know n d until after the data have been collected The value of n d changes from sample to sample

47 Population partitioned into domains Recall U = index set for population = {1, 2, …, N } Domain index set for domain d = 1, 2, …, D U d = {1, 2, …, N d } where N d = number of OUs in domain d in the population In sample of size n n d = number of sample units from domain d are in the sample S d = index set for sample belonging to domain d Domain D d=1d=2... d=Dd=D Domain #1

48 Boat owner example Population N = 400,000 boat owners (currently licensed) Sample n = 1,500 owners selected using SRS Divide universe (population) into 2 domains d = 1own open motor boat > 16 ft. (large boat) d = 2do not own this type of boat Of the n = 1500 sample owners: n 1 = 472 owners of open motor boat > 16 ft. n 2 = 1028 owners do not own this kind of boat

49 New population parameters Domain mean Domain total

50 Boat owner example - 2 Estimate population domain mean Estimate the average number of children for boat owners from domain 1 Estimate proportion of boat owners from domain 1 who have children Estimate population domain total Estimate the total number of children for large boat owners (domain 1)

51 New population parameter – 2 Ratio form of population mean Numerator variable Denominator variable

52 Boat owner example - 3 Estimate mean number of children for owners from domain 1 Zero values for OUs that are not in domain 1 Applies to whole pop

53 Boat example – 4

54 Estimator for population domain mean

55 Boat example – 5 Domain 1 data

56 Boat example – 6 Domain 1 and domain 2 data combined 1104 zeros = 76 zeros from domain zeros from domain 2

57 Two ways of estimating mean Boat example – 7 Whole data set Domain 1 data only

58 Estimator for variance of

59 Boat example – 8

60 Boat example – 9

61 Approximation for estimator of variance of Domain 1 data only

62 Estimated variance of Estimator for Domain variance estimator is directly related

63 Relationship to estimating a ratiowith Population mean of X Residual

64 Relationship to estimating a ratiowith - 2 Residual variance

65 Estimator for variance of

66 Estimating a population domain total If we know the domain sizes, N d

67 Estimating a population domain total- 2 If we do NOT know the domain sizes Standard SRS estimator using u as the variable

68 Boat example – 10 Do not know the domain size, N 1

69 Comparing 2 domain means Suppose we want to test the hypothesis that two domain means are equal Construct a z-test with Type 1 error rate  (for falsely rejecting null hypothesis) Test statistic: Critical value: z  /2 Reject H 0 if |z| > z  /2

70 Boat example - 10 Large boat owners (d = 1) Other boat owners (d = 2)

71 Boat example - 11 Test whether domain means are equal at  = 0.05 Calculate z-statistic Critical value z  /2 = z 0.25 = 1.96 Apply rejection rule |z| = |-1.04|=1.04 < 1.96 = z 0.25 Fail to reject H 0

72 Overview Population parameters Mean Total Proportion (w/ fixed denom) Ratio Includes proportion w/ random denominator Domain mean Domain total

73 Overview – 2 Estimation strategies No auxiliary information Auxiliary information X, no intercept Y and X positively correlated Linear relationship passes through origin Auxiliary information X, intercept Y and X positively correlated Linear relationship does not pass through origin

74 Overview – 3 Make a table of population parameters (rows) by estimation strategy (columns) In each cell, write down Estimator for population parameter Estimator for variance of estimated parameter Residual e i Notes Some cells will be blank Look for relationship between mean and total, and mean and proportion Look at how the variance formulas for many of the estimators are essentially the same form