Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, 3-3155 December 2008.

Slides:



Advertisements
Similar presentations
Calculation of Sampling Errors MICS3 Regional Workshop on Data Archiving and Dissemination Alexandria, Egypt 3-7 March, 2007.
Advertisements

1 Session 10 Sampling Weights: an appreciation. 2 To provide you with an overview of the role of sampling weights in estimating population parameters.
Research on Improvements to Current SIPP Imputation Methods ASA-SRM SIPP Working Group September 16, 2008 Martha Stinson.
Preparing Data for Quantitative Analysis
Descriptive Statistics. Descriptive Statistics: Summarizing your data and getting an overview of the dataset  Why do you want to start with Descriptive.
9. Weighting and Weighted Standard Errors. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
19.Multivariate Analysis Using NLTS2 Data. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Simple Logistic Regression
Getting Started With STATA How do I do this? It probably opened automatically, but you may have to save it to the desktop, and double-click it to open.
5/15/2015Slide 1 SOLVING THE PROBLEM The one sample t-test compares two values for the population mean of a single variable. The two-sample test of a population.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
By Wendiann Sethi Spring  The second stages of using SPSS is data analysis. We will review descriptive statistics and then move onto other methods.
Analysis of Complex Survey Data Day 5, Special topics: Developing weights and imputing data.
McGraw-Hill/Irwin McGraw-Hill/Irwin Copyright © 2009 by The McGraw-Hill Companies, Inc. All rights reserved.
Chi-square Test of Independence
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
15b. Accessing Data: Frequencies in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
SW318 Social Work Statistics Slide 1 Using SPSS for Graphic Presentation  Various Graphics in SPSS  Pie chart  Bar chart  Histogram  Area chart 
Problem 1: Relationship between Two Variables-1 (1)
Analysis of National Health Interview Survey Data
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Introduction to SPSS (For SPSS Version 16.0)
Scot Exec Course Nov/Dec 04 Ambitious title? Confidence intervals, design effects and significance tests for surveys. How to calculate sample numbers when.
Consumption calculations with real data – CORRECTED VERSION (CORRECTIONS IN RED) Gretchen Donehower Day 3, Session 2, NTA Time Use and Gender Workshop.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Household Surveys ACS – CPS - AHS INFO 7470 / ECON 8500 Warren A. Brown University of Georgia February 22,
Complexities of Complex Survey Design Analysis. Why worry about this? Many government studies use these designs – CDC National Health Interview Survey.
STAT 3130 Statistical Methods II Missing Data and Imputation.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
Srinivasulu Rajendran Centre for the Study of Regional Development (CSRD) Jawaharlal Nehru University (JNU) New Delhi India
X 11 X 12 X 13 X 21 X 22 X 23 X 31 X 32 X 33. Research Question Are nursing homes dangerous for seniors? Does admittance to a nursing home increase risk.
9/23/2015Slide 1 Published reports of research usually contain a section which describes key characteristics of the sample included in the study. The “key”
LINDSEY BREWER CSSCR (CENTER FOR SOCIAL SCIENCE COMPUTATION AND RESEARCH) UNIVERSITY OF WASHINGTON September 17, 2009 Introduction to SPSS (Version 16)
SW388R6 Data Analysis and Computers I Slide 1 Central Tendency and Variability Sample Homework Problem Solving the Problem with SPSS Logic for Central.
Using SPSS for Windows Part II Jie Chen Ph.D. Phone: /6/20151.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
School of Information - The University of Texas at Austin LIS 397.1, Introduction to Research in Library and Information Science LIS Introduction.
Panel Study of Entrepreneurial Dynamics Richard Curtin University of Michigan.
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Chi-Square Procedures Chi-Square Test for Goodness of Fit, Independence of Variables, and Homogeneity of Proportions.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Consumption calculations with real data Gretchen Donehower Day 3, Session 2, NTA Time Use and Gender Workshop Wednesday, May 23, 2012 Institute for Labor,
SW318 Social Work Statistics Slide 1 Frequency: Nominal Variable Practice Problem This question asks the frequency of widowed respondents of the survey.
Analysis Introduction Data files, SPSS, and Survey Statistics.
Data Lab # 4 June 16, 2008 Ivan Katchanovski, Ph.D. POL 242Y-Y.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Analytical Example Using NHIS Data Files John R. Pleis.
PSC 47410: Data Analysis Workshop  What’s the purpose of this exercise?  The workshop’s research questions:  Who supports war in America?  How consistent.
Analysis of the characteristics of internet respondents to the 2011 Census to inform 2021 Census questionnaire design Orlaith Fraser & Cal Ghee.
SW388R6 Data Analysis and Computers I Slide 1 Comparing Central Tendency and Variability across Groups Impact of Missing Data on Group Comparisons Sample.
Analysis of Experiments
Multiple Imputation using SAS Don Miller 812 Oswald Tower
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Using Data from the National Survey of Children with Special Health Care Needs Centers for Disease Control and Prevention National Center for Health Statistics.
Welcome  Log on using the username and password you received at registration  Copy the folder: F:/sarah/mon-morning To your H drive.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Survey Design: Some Implications for.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Working with the ECLS-B Datasets Weights and other issues.
Notes on Logistic Regression
Advanced Analytics Using Enterprise Miner
Using Weights in the Analysis of Survey Data
ECONOMETRICS ii – spring 2018
Statistical Analysis Chi-Square.
Using Weights in the Analysis of Survey Data
By A.Arul Xavier Department of mathematics
Presentation transcript:

Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008

Review of David Johnson’s Presentation A population weight (pweight) is a variable which indicates how many people (in the population of interest) an observation will count in a statistical procedure. This is different from a frequency weight (fweight), which indicates a row of a dataset actually represents more than one observation. Weights can be used to correct for design (over- and under- sampling), and for non-response bias. Most software packages treat pweights properly (a notable exception is SPSS outside of complex survey package). To create a pweight, use either a “raking”-type algorithm, or a logistic regression.

How to use Population Weights SAS: Use the “weight” statement in procedures; this is a population weight: proc logistic data=mydata descending; model finished=age cs_educ sex race_white a1b a2b a3b a4b a5b a6b; weight pwgt_variable; run; Stata: Use the “pweight” option (you can use “pw”): regress y x1 x2 x3 [pweight=pwgt_variable]

Raking 1: Select Census Data Choose a census dataset (CPS, ACS, etc.), and which variables you will use in your “raking model”. These are usually demographics variables (age, race, education, gender). You will need to recode your survey variables and/or the census variables so the response categories match. This might require grouping some values together. Match the year of the survey with the census data. If you have 2006 survey data, use the 2006 census data. Match the physical area as closely as possible. For example, the ACS uses PUMA codes (basically county-level data). Select only the PUMA codes of the area of interest. You should probably do some simple descriptives / frequencies to compare survey to census. Remember the ACS already has a weight (PWGTP).

Raking 2: Frequencies (Census data) Construct 1-way frequency counts for every variable in the raking model. You need a dataset for each variable, with “mrgtotal” being the counts. SAS code example (do this for gender, race, etc.): proc freq data=acs.acs_myarea_recoded; table cs_educ /list missing out=cs_educ; weight PWGTP; run; data cs_educ; set cs_educ; rename COUNT=mrgtotal; run;

Raking 3: Raking Macro (SAS) Izrael, etc. has provided a SAS Macro (RAKINGE) to do the main raking procudure. This is introduced in Paper , from SAS SUGI 25. This is available online from SAS at: Various improvements were made to macro and introduced in Paper , from SAS SUGI 29. This is available online from SAS at: I uploaded the (corrected version of the) RAKINGE macro here:

Raking 4: Raking Macro (SAS) You will need to save this macro, edit it slightly, and run it. The vast majority of the code you will never touch. Towards the top of the program you will need to change these lines: %macro rakinge (inds=INPUTDATASETNAME, outds=OUTPUTDATASETNAME,... outwt=NEW_PWEIGHT_VARIABLE_NAME,... varlist=LIST OF VARIABLES IN RAKING MODEL, numvar=4,

Normalized Weight If the raking macro does not converge, look at the frequencies (for census and survey) again. You may need to collapse some categories, or change the convergence criterion in the raking macro (you can control this with the TRMPCT= and NUMITER= options in the macro). You may wish to “normalize” the weight, so the sum of the weights for the dataset equal to a predetermined number N (either sample size or the area’s total population). To do this, calculate SW = the sum of the weights then multiply each weight value by N/SW.

Non-Response Bias 1 The probability to survey completion may differ with people of different characteristics (demographics, chronic conditions, etc.). To address this non- response bias, estimate a logistic regression model such as the following: FINISH = β 0 + β 1 AGE + β 2 EDUCATION + β 3 FEMALE + β 4 WHITE + β 5 OTHER + ε Where FINISH is 1 if they finished the survey (0 otherwise). The next four values are from the raking model. The next value (OTHER, there can be more than one of these) are other variables which might explain non-response bias.

Non-Response Bias 2 For each respondent, the non-response bias weight is the reciprocal of the predicted probability of survey completion. It is treated as a weight, and should be multiplied with the raking weight to create a total weight. Sample SAS code (continued next slide): proc logistic data=area_weighted descending; class finish; model finish=age cs_educ sex race_white a1b; output out=logitresults p=p; weight pwgt; run;/* check output to see if significant */

Non-Response Bias 3 (SAS code continued): proc sort data=logitresults; by ID; run; data area_weighted2; /* merge in pred. prob. */ merge area_weighted logitresults; by ID; run; data area_weighted2; /* calculate non-resp wgt */ set area_weighted2; nonresp_wgt=1/p; total_wgt=pwgt*nonresp_wgt; run;

Stata / R I personally haven’t tried either of these yet, but raking packages exist for Stata and R: Stata: survwgt - you can get and install this using findit survwgt rake [pw], by(varlist_raking_model) totvars(varlist_totals) { generate(pwgt) | replace } R: Rake package sraked_data <- simpleRake(unraked_data, pop_totals, “rake_var1”, “rakevar2”,..., TRUE)