Introduction to Survey Data Analysis

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Treatment of missing values
Analyzing Survey Data Angelina Hill, Associate Director of Academic Assessment 2009 Academic Assessment Workshop May 14 th & 15 th UNLV.
Some birds, a cool cat and a wolf
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Chapter 1 Introduction and Data Collection
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Sample Size Determination CHAPTER Eleven.
Design Effects: What are they and how do they affect your analysis? David R. Johnson Population Research Institute & Department of Sociology The Pennsylvania.
Secondary Data Analysis Linda K. Owens, PhD Assistant Director for Sampling and Analysis Survey Research Laboratory University of Illinois.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
Introduction to Secondary Data Analysis Young Ik Cho, PhD Research Associate Professor Survey Research Laboratory University of Illinois at Chicago Fall,
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
Statistics Canada Citizenship and Immigration Canada Methodological issues.
Tutorial I: Missing Value Analysis
1 of 22 INTRODUCTION TO SURVEY SAMPLING October 6, 2010 Linda Owens Survey Research Laboratory University of Illinois at Chicago
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Weighting and imputation PHC 6716 July 13, 2011 Chris McCarty.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 1-1 Statistics for Managers Using Microsoft ® Excel 4 th Edition Chapter.
Best Practices for Handling Missing Data
HANDLING MISSING DATA.
Statistics Statistics is that field of science concerned with the collection, organization, presentation, and summarization of data, and the drawing of.
Missing data: Why you should care about it and what to do about it
HW Page 23 Have HW out to be checked.
Handling Attrition and Non-response in the 1970 British Cohort Study
Peter Linde, Interviewservice Statistics Denmark
Working with the ECLS-B Datasets Weights and other issues.
Understanding and Evaluating Survey Data Documentation: A User’s Guide
MISSING DATA AND DROPOUT
Analysis of Survey Results
Medical Expenditure Panel Survey
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Multiple Imputation.
Multiple Imputation Using Stata
Complex Surveys
Chapter 7 Sampling Distributions
Variables and Measurement (2.1)
Sampling Sampling relates to the degree to which those surveyed are representative of a specific population The sample frame is the set of people who have.
Data Collection and Sampling
Nonresponse in Survey Sampling
Presenter: Ting-Ting Chung July 11, 2017
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Missing Data Imputation in the Bayesian Framework
Sampling Sampling relates to the degree to which those surveyed are representative of a specific population The sample frame is the set of people who have.
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
Missing Data Mechanisms
15.1 The Role of Statistics in the Research Process
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Lecture 1: Descriptive Statistics and Exploratory
New Techniques and Technologies for Statistics 2017  Estimation of Response Propensities and Indicators of Representative Response Using Population-Level.
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Chapter 2 Examining Your Data
Chapter 13: Item nonresponse
Presentation transcript:

Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois at Chicago

Focus of the seminar Data cleaning/missing data Sampling bias reduction  

When analyzing survey data... 1. Understand & evaluate survey design 2. Screen the data 3. Adjust for sampling design

1. Understand & evaluate survey Conductor of survey Sponsor of survey Measured variables Unit of analysis Mode of data collection Dates of data collection

1. Understand & evaluate survey Geographic coverage Respondent eligibility criteria Sample design Sample size & response rate

Levels of measurement Nominal Ordinal Interval Ratio

2. Data screening Examine raw frequency distributions for…   (a) out-of-range values (outliers) (b) missing values

2. Data screening Out-of-range values:   Delete data Recode values

Missing data: can reduce effective sample size may introduce bias

Reasons for missing data Refusals (question sensitivity) Don’t know responses (cognitive problems, memory problems) Not applicable Data processing errors Questionnaire programming errors Design factors Attrition in panel studies

Effects of ignoring missing data Reduced sample size – loss of statistical power Data may no longer be representative– introduces bias Difficult to identify effects

Assumptions on missing data Missing completely at random (MCAR) Missing at random (MAR) Ignorable Nonignorable

Missing completely at random (MCAR) Being missing is independent from any variables. Cases with complete data are indistinguishable from cases with missing data. Missing cases are a random sub-sample of original sample.

Missing at random (MR) The probability of a variable being observed is independent of the true value of that variable controlling for one or more variables. Example: Probability of missing income is unrelated to income within levels of education.

Ignorable missing data The data are MAR. The missing data mechanism is unrelated to the parameters we want to estimate.

Nonignorable missing data The pattern of data missingness is non-MAR.

Methods of handling missing data Listwise (casewise) deletion: uses only complete cases Pairwise deletion: uses all available cases Dummy variable adjustment: Missing value indicator method Mean substitution: substitute mean value computed from available cases (cf. unconditional or conditional)  

Methods of handling missing data Regression methods: predict value based on regression equation with other variables as predictors Hot deck: identify the most similar case to the case with a missing and impute the value

Methods of handling missing data Maximum likelihood methods: use all available data to generate maximum likelihood-based statistics.

Methods of handling missing data Multiple imputation: combines the methods of ML to produce multiple data sets with imputed values for missing cases

Types of survey sample designs Simple Random Sampling Systematic Sampling Complex sample designs stratified designs cluster designs mixed mode designs

Why complex sample designs? Increased efficiency Decreased costs

Why complex sample designs? Statistical software packages with an assumption of SRS underestimate the sampling variance. Not accounting for the impact of complex sample design can lead to a biased estimate of the sampling variance (Type I error).

Sample weights Used to adjust for differing probabilities of selection. In theory, simple random samples are self-weighted. In practice, simple random samples are likely to also require adjustments for nonresponse.

Types of sample weights Poststratification weights: designed to bring the sample proportions in demographic subgroups into agreement with the population proportion in the subgroups. Nonresponse weights: designed to inflate the weights of survey respondents to compensate for nonrespondents with similar characteristics. “Blow-up” (expansion) weights: provide estimates for the total population of interest.

Syntax examples of design-based analysis in STATA, SUDAAN, & SAS svyset strata strata svyset psu psu svyset pweight finalwt svyreg fatitk age male black hispanic SUDAAN proc regress data=”c:\nhanes.sav” filetype=spss desgn=wr; nest strata psu; weight finalwt subpgroup sex race; levels 2 3; model fatintk = age sex race;

Syntax examples of design-based analysis in STATA, SUDAAN, & SAS proc surveyreg data=nhanes; strata strata; cluster psu; class sex race; model fatintk = age sex race; weight finalwt

In summary, when analyzing survey data... Understand & evaluate survey design Screen the data – deal with missing data & outliers. If necessary, adjust for study design using weights and appropriate computer software.  

Thank You! www.srl.uic.edu