Presented: 2009 Canadian Users Stata Group Meeting

Slides:



Advertisements
Similar presentations
Session 1: Introduction to Complex Survey Design
Advertisements

Automating the Production of Descriptive Tables at Statistics Canada mog.ado, a user-written program with quality controls Questions and comments may be.
Mean, Proportion, CLT Bootstrap
Dr. Chris L. S. Coryn Spring 2012
Formalizing the Concepts: Simple Random Sampling.
Understanding sample survey data
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
Definitions Observation unit Target population Sample Sampled population Sampling unit Sampling frame.
Chap 20-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 20 Sampling: Additional Topics in Sampling Statistics for Business.
ICCS 2009 IDB Workshop, 18 th February 2010, Madrid 1 Training Workshop on the ICCS 2009 database Weighting and Variance Estimation picture.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Institute of Professional Studies School of Research and Graduate Studies Selecting Samples and Negotiating Access Lecture Eight.
Lecture 5.  It is done to ensure the questions asked would generate the data that would answer the research questions n research objectives  The respondents.
PISA (and PIAAC) Data analysis using Stata (July 2017)
Sample Surveys.
Chapter 8: Estimating with Confidence
Comparing Two Proportions
More on Inference.
Chapter 8: Estimating with Confidence
Chapter Ten Basic Sampling Issues Chapter Ten.
Sampling.
Overview of probability and statistics
Dr. Unnikrishnan P.C. Professor, EEE
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Ch 9 實習.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
Peter Linde, Interviewservice Statistics Denmark
Working with the ECLS-B Datasets Weights and other issues.
Chapter 23 Comparing Means.
Marissa Gargano Sarrynna Sou Susan Edwards Chris Cummiskey
Section 9.2 – Sample Proportions
Sampling.
Part III – Gathering Data
1. Estimation ESTIMATION.
Introduction to secondary analysis of complex survey data
Sampling And Sampling Methods.
SAMPLING (Zikmund, Chapter 12.
Chapter 5 Sampling Distributions
CHAPTER 12 Sample Surveys.
Chapter 5 Sampling Distributions
Introduction to Survey Data Analysis
Power, Sample Size, & Effect Size:
STRATIFIED SAMPLING.
Making Statistical Inferences
Sampling Design.
More on Inference.
Variables and Measurement (2.1)
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Sampling.
SASU manual: sampling issues
Warmup To check the accuracy of a scale, a weight is weighed repeatedly. The scale readings are normally distributed with a standard deviation of
SAMPLING (Zikmund, Chapter 12).
Chapter 8: Estimating with Confidence
Sampling and Power Slides by Jishnu Das.
6A Types of Data, 6E Measuring the Centre of Data
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Sampling.
Chapter 8: Estimating with Confidence
RESEARCH METHODS Lecture 26
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Chapter 8: Estimating with Confidence
Ch 9 實習.
Presentation transcript:

Presented: 2009 Canadian Users Stata Group Meeting Using Stata with Statistics Canada data: Incorporating complex survey design into analysis Presented: 2009 Canadian Users Stata Group Meeting Leslie-Anne Keown, Ph.D. & Georgia Roberts, Ph.D. Statistics Canada

Overview of Presentation Statistics Canada- What we do Survey Design at Statistics Canada: What makes it complex Accounting for complex survey design in analysis Design based approach Survey weight Bootstrapped variance estimation Statistics Canada • Statistique Canada 06/11/2018

Statistics Canada: Our Mandate In Canada, providing statistics is a federal responsibility. As Canada’s central statistical agency, Statistics Canada is legislated to serve this function for the whole of Canada and each of the provinces. In addition to conducting a Census every five years, there are about 350 active surveys on virtually all aspects of Canadian life. We at Statistics Canada are committed to protecting the confidentiality of all information entrusted to us and to ensuring that the information we deliver is timely and relevant to Canadians.

Survey Design at Statistics Canada Our samples are not simple random samples (SRS) but rather complex survey designs Many different sampling frames Complicated by needing to protect confidentiality Thus, we produce 'weights' to 'correct' for the fact that we do not have SRS and for non-response etc. in estimates We produce bootstrap 'weights' or mean 'bootstrap weights' to calculate 'truer' variance estimates given the complex survey design

Complex Surveys Features of complex surveys that can impact analysis: - Stratification - Multi-stage selection (choosing clusters) - Unequal probabilities of selection - Nonresponse - Adjustments in survey weights A complex survey design

Design-based randomization Finite Target Population Sample 1 Survey Population Sample i Sampling Process

Infinite target population Finite populations are generated from the infinite population. Randomization for estimator is based on both the model and the design. Infinite Target Population

DESIGN-BASED APPROACH Survey (or sample) weighting is used to produce an estimate of each unknown quantity Variance is measured by the variability in an estimate that would occur had different samples been selected by the same design (called design-based variance) Statistics Canada uses survey bootstrapping to estimate variance for many of its household surveys. For more specifics see: Phillips, Owen. 2004. 'Using Bootstrap Weights with WesVar and SUDAAN.' The Research Data Centres Information and Technical Bulletin. (Fall) 1(2):1-10. Statistics Canada. Catalogue no. 12-002-XIE.

What is a survey weight and what type of weight should I use? Sampling or survey weight of the ith unit ≈ 1 / (Probability of picking a sample containing that unit – for the particular survey design used) [The survey weight usually also contains adjustments for nonresponse and other factors] Stata uses the terminology 'sampling weight' or 'probability weight' or 'pweight'

What is a bootstrap variance estimate? Let represent a weighted estimate of the quantity of interest using the sampling weight. Let represent a weighted estimate of the quantity of interest using the bth of B (mean) bootstrap weights. Then the survey bootstrap estimate of variance of is where C is the number of bootstrap samples used for each (mean) bootstrap weight.

Design-based analysis with Stata : some essentials Most commands that accept ‘pweights’ will correctly produce survey-weighted estimates of quantities of interest when using Statistics Canada data. ONLY commands with the 'svy' prefix, combined with pweights and additional design information, will produce design-based variance estimates. When the additional design information is in the form of ‘bootstrap weights’, particular options must be specified with the 'svy' prefix in order to produce survey bootstrap variance estimates.

Statistics Canada • Statistique Canada 06/11/2018

Statistics Canada • Statistique Canada 06/11/2018

Specifying the correct options in svyset Best practice is to use a svyset statement New users to this may want to use the dialog box to set options Sampling weight variable : name of weight variable BRR weight variables – usually something like wtsb_001- wtsb_500 (be careful here) Fay’s adjustment – used if mean bootstrap weights provided (more later) Method is BRR with MSE formula Statistics Canada • Statistique Canada 06/11/2018

Sample SVYSET statement svyset [pweight=wght_per], brrweight(wtbs_001- wtbs_200) fay(0.8) vce(brr) mse Element Command Section Sampling Weight [pweight=wght_per] BRR weights brrweight(wtbs_001- wtbs_200) Adjustment for mean bootstrap (if needed) fay(0.8) Variance Estimation Method vce(brr) mse Statistics Canada • Statistique Canada 06/11/2018

Statistics Canada • Statistique Canada Mean Bootstraps Sometimes for confidentiality reasons and file size Statistics Canada produces ‘mean bootstrap weights’ Mean bootstrap weights (at their simplest) are a mean of a set number of bootstrap weights (e.g. – 5 bootstraps are calculated and the file shows the mean of these weights as a single weight) Statistics Canada • Statistique Canada 06/11/2018

Statistics Canada • Statistique Canada Fay’s adjustment Have to adjust for mean bootstraps and this is done using Fay’s adjustment Calculate the value needed through the following formula: 1-C-½ where C is the number of bootstraps in each mean Eg. C=25 so Fay’s adjustment is 0.8 Reference: Phillips, Owen. 2004. 'Using Bootstrap Weights with WesVar and SUDAAN.' The Research Data Centres Information and Technical Bulletin. (Fall) 1(2):1-10. Statistics Canada. Catalogue no. 12-002-XIE. Statistics Canada • Statistique Canada 06/11/2018

Statistics Canada • Statistique Canada Now the analysis Once the data is 'svyset', then analysis can be done Need only to use the 'svy' prefix svy:logistic fert sex age hsdinc Statistics Canada • Statistique Canada 06/11/2018

Statistics Canada • Statistique Canada Tips and tricks Use a log and a do-file 'Svyset' the data at the beginning of each do-file Do not use 'quietly' – you want to see in the output that it all went properly Check a weighted estimate against a svy estimate The estimate should be the same The variance should be different (usually larger but not always) Statistics Canada • Statistique Canada 06/11/2018

Tips and Tricks – Check First You can check the 'svyset' and 'svy' commands are working by using only 10 or so BRR weights to start Be careful: remember to reset using the full set of BRR weights Statistics Canada • Statistique Canada 06/11/2018

Tips and Tricks: Model Building Don’t start using the 'svy' commands Run weighted models first using 'pweights' Use this rule of thumb in models: if the p value for the estimate is .000 then it will likely remain significant Non-significant will remain non-significant ‘Marginally’ significant very likely to become non-significant Statistics Canada • Statistique Canada 06/11/2018

Tips and Tricks: Model Testing Diagnostics and/or model testing after using Survey commands still a matter of some debate Do what you can Check to see how closely the 'weighted only' errors and 'svy' errors match. If there is not much change may consider running diagnostics on the non-svyset models Remember – ‘Svy’ commands allow incorporation of complex survey information but do not correct for ‘operator’ error or substitute for due diligence Statistics Canada • Statistique Canada 06/11/2018