Variance Estimation in Complex Surveys Drew Hardin Kinfemichael Gedif.

Presentation on theme: "Variance Estimation in Complex Surveys Drew Hardin Kinfemichael Gedif."— Presentation transcript:

Variance Estimation in Complex Surveys Drew Hardin Kinfemichael Gedif

So far.. Variance for estimated mean and total under Variance for estimated mean and total under SRS, Stratified, Cluster (single, multi-stage), etc. SRS, Stratified, Cluster (single, multi-stage), etc. Variance for estimating a ratio of two means under Variance for estimating a ratio of two means under SRS (we used linearization method) SRS (we used linearization method)

What about other cases? Variance for estimators that are not linear combinations of means and totals Variance for estimators that are not linear combinations of means and totals –Ratios Variance for estimating other statistic from complex surveys Variance for estimating other statistic from complex surveys –Median, quantiles, functions of EMF, etc. Other approaches are necessary Other approaches are necessary

Outline Variance Estimation Methods Variance Estimation Methods –Linearization –Random Group Methods –Balanced Repeated Replication (BRR) –Resampling techniques Jackknife, Bootstrap Jackknife, Bootstrap Adapting to complex surveys Adapting to complex surveys Hot research areas Hot research areas Reference Reference

Linearization (Taylor Series Methods) We have seen this before (ratio estimator and other courses). We have seen this before (ratio estimator and other courses). Suppose our statistic is non-linear. It can often be approximated using Taylors Theorem. Suppose our statistic is non-linear. It can often be approximated using Taylors Theorem. We know how to calculate variances of linear functions of means and totals. We know how to calculate variances of linear functions of means and totals.

Linearization (Taylor Series Methods) Linearize Linearize Calculate Variance Calculate Variance

Linearization (Taylor Series) Methods –Pro: Can be applied in general sampling designs Can be applied in general sampling designs Theory is well developed Theory is well developed Software is available Software is available –Con: Finding partial derivatives may be difficult Finding partial derivatives may be difficult Different method is needed for each statistic Different method is needed for each statistic The function of interest may not be expressed a smooth function of population totals or means The function of interest may not be expressed a smooth function of population totals or means Accuracy of the linearization approximation Accuracy of the linearization approximation

Random Group Methods Based on the concept of replicating the survey design Based on the concept of replicating the survey design Not usually possible to merely go and replicate the survey Not usually possible to merely go and replicate the survey However, often the survey can be divided into R groups so that each group forms a miniature versions of the survey However, often the survey can be divided into R groups so that each group forms a miniature versions of the survey

Random Group Methods 12345678Stratum 1 12345678Stratum 2 12345678Stratum 3 12345678Stratum 4 12345678Stratum 5 Treat as miniature sample

Unbiased Estimator (Average of Samples) Unbiased Estimator (Average of Samples) Slightly Biased Estimator (All Data) Slightly Biased Estimator (All Data)

Random Group Methods Pro: Pro: –Easy to calculate –General method (can also be used for non smooth functions) Con: Con: –Assumption of independent groups (problem when N is small) –Small number of groups (particularly if one strata is sampled only a few times) –Survey design must be replicated in each random group (presence of strata and clusters remain the same)

Resampling and Replication Methods Balanced Repeated Replication (BRR) Balanced Repeated Replication (BRR) –Special case when n h =2 Jackknife (Quenouille (1949) Tukey (1958)) Jackknife (Quenouille (1949) Tukey (1958)) Bootstrap (Efron (1979) Shao and Tu (1995)) Bootstrap (Efron (1979) Shao and Tu (1995)) These methods These methods Extend the idea of random group method Extend the idea of random group method Allows replicate groups to overlap Allows replicate groups to overlap Are all purpose methods Are all purpose methods Asymptotic properties ?? Asymptotic properties ??

Balanced Repeated Replication Suppose we had sampled 2 per stratum Suppose we had sampled 2 per stratum There are 2 H ways to pick 1 from each stratum. There are 2 H ways to pick 1 from each stratum. Each combination could treated as a sample. Each combination could treated as a sample. Pick R samples. Pick R samples.

Balanced Repeated Replication Which samples should we include? Which samples should we include? –Assign each value either 1 or –1 within the stratum –Select samples that are orthogonal to one another to create balance –You can use the design matrix for a fraction factorial –Specify a vector r of 1,-1 values for each stratum Estimator Estimator

Balanced Repeated Replication Pro Pro –Relatively few computations –Asymptotically equivalent to linearization methods for smooth functions of population totals and quantiles –Can be extended to use weights Con Con –2 psu per sample Can be extended with more complex schemes Can be extended with more complex schemes

The Jackknife SRS-with replacement Quenoule (1949); Tukey (1958); Shao and Tu (1995) Quenoule (1949); Tukey (1958); Shao and Tu (1995) Let be the estimator of after omitting the i th observation Let be the estimator of after omitting the i th observation Jackknife estimate Jackknife estimate Jackknife estimator of the Jackknife estimator of the For Stratified SRS without replacement Jones (1974) For Stratified SRS without replacement Jones (1974)

The Jackknife stratified multistage design In stratum h, delete one PSU at a time In stratum h, delete one PSU at a time Let be the estimator of the same form as when PSU i of stratum h is omitted Let be the estimator of the same form as when PSU i of stratum h is omitted Jackknife estimate: Jackknife estimate: Or using pseudovalues Or using pseudovalues

The Jackknife stratified multistage design Different formulae for Different formulae for Where Where Using the pseudovalues Using the pseudovalues

The Jackknife Asymptotics Krewski and Rao (1981) Krewski and Rao (1981) Based on the concept of a sequence of finite populations with L strata in Based on the concept of a sequence of finite populations with L strata in Under conditions C1-C6 given in the paper Under conditions C1-C6 given in the paper Where method is the estimator used (Linearization, BRR, Jackknife)

The Bootstrap Naïve bootstrap Efron (1979); Rao and Wu (1988); Shao and Tu (1995) Efron (1979); Rao and Wu (1988); Shao and Tu (1995) Resample with replacement in stratum h Resample with replacement in stratum h Estimate: Estimate: Variance: Variance: –Or approximate by The estimator is not a consistent estimator of the variance of a general nonlinear statistics The estimator is not a consistent estimator of the variance of a general nonlinear statistics

The Bootstrap Naïve bootstrap For For Comparing with Comparing with The ratio does not converge to 1for a bounded n h The ratio does not converge to 1for a bounded n h

The Bootstrap Modified bootstrap Resample with replacement in stratum h Resample with replacement in stratum h Calculate: Calculate: Variance: Variance: Can be approximated with Monte Carlo Can be approximated with Monte Carlo For the linear case, it reduces to the customary unbiased variance estimator For the linear case, it reduces to the customary unbiased variance estimator m h < n h m h < n h

More on bootstrap The method can be extended to stratified srs without replacement by simply changing The method can be extended to stratified srs without replacement by simply changing For m h =n h -1, this method reduces to the naïve BS For m h =n h -1, this method reduces to the naïve BS For n h =2, m h =1, the method reduces to the random half-sample replication method For n h =2, m h =1, the method reduces to the random half-sample replication method For n h >3, choice of m h …see Rao and Wu (1988) For n h >3, choice of m h …see Rao and Wu (1988)

Simulation Rao and Wu (1988) Jackknife and Linearization intervals gave substantial bias for nonlinear statistics in one sided intervals Jackknife and Linearization intervals gave substantial bias for nonlinear statistics in one sided intervals The bootstrap performs best for one-sided intervals (especially when m h =n h -1) The bootstrap performs best for one-sided intervals (especially when m h =n h -1) For two-sided intervals, the three methods have similar performances in coverage probabilities For two-sided intervals, the three methods have similar performances in coverage probabilities The Jackknife and linearization methods are more stable than the bootstrap The Jackknife and linearization methods are more stable than the bootstrap B=200 is sufficient B=200 is sufficient

Hot topics Jackknife with non-smooth functions (Rao and Sitter 1996) Jackknife with non-smooth functions (Rao and Sitter 1996) Two-phase variance estimation (Graubard and Korn 2002; Rubin-Bleuer and Schiopu- Kratina 2005) Two-phase variance estimation (Graubard and Korn 2002; Rubin-Bleuer and Schiopu- Kratina 2005) Estimating Function (EF) bootstrap method (Rao and Tausi 2004) Estimating Function (EF) bootstrap method (Rao and Tausi 2004)

Software OSIRIS – BRR, Jackknife OSIRIS – BRR, Jackknife SAS – Linearization SAS – Linearization Stata – Linearization Stata – Linearization SUDAAN – Linearization, Bootstrap, Jackknife SUDAAN – Linearization, Bootstrap, Jackknife WesVar – BRR, JackKnife, Bootstrap WesVar – BRR, JackKnife, Bootstrap

References: Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of statistics 7, 1-26. Effron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of statistics 7, 1-26. Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters using sample surveys. Statistical Science, 17, 73-96. Graubard, B., J., Korn, E., L. (2002). Inference for supper population parameters using sample surveys. Statistical Science, 17, 73-96. Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Properties of linearization, jackknife, and balanced replication methods. The annals of statistics. 9, 1010-1019. Krewski, D., and Rao, J., N., K. (1981). Inference from stratified samples: Properties of linearization, jackknife, and balanced replication methods. The annals of statistics. 9, 1010-1019. Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical Statistics 20, 355-375. Quenouille, M., H.(1949). Problems in plane sampling. Annals of Mathematical Statistics 20, 355-375. Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex survey data. JASA, 83, 231-241. Rao, J.,N.,K., and Wu, C., F., J., (1988). Resampling inferences with complex survey data. JASA, 83, 231-241. Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation under stratified multistage sampling. Communications in statistics. 33:, 2087-2095. Rao, J.,N.,K., and Tausi, M. (2004). Estimating function variance estimation under stratified multistage sampling. Communications in statistics. 33:, 2087-2095. Rao, J. N. K., and Sitter, R. R. (1996). Discussion of Shaos paper.Statistics, 27, pp. 246–247. Rubin-Bleuer, S., and Schiopu-Kratina, I. (2005). On the two-phase framework for joint model and design based framework. Annals of Statistics (to appear) Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag. Shao, J., and Tu, (1995). The jackknife and bootstrap. New York: Springer-Verlag. Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics. 29:614. Tukey, J.W. (1958). Bias and confidence in not-quite large samples. Annals of Mathematical Statistics. 29:614. Not referred in the presentation Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Verlag. Wolter, K. M. (1985) Introduction to variance estimation. New York: Springer-Verlag. Shao, J. (1996). Resampling Methods in Sample Surveys. Invited paper, Statistics, 27, pp. 203–237, with discussion, 237–254.