Non response and missing data in longitudinal surveys.

Slides:



Advertisements
Similar presentations
Statistics for Improving the Efficiency of Public Administration Daniel Peña Universidad Carlos III Madrid, Spain NTTS 2009 Brussels.
Advertisements

Jörg Drechsler (Institute for Employment Research, Germany) NTTS 2009 Brussels, 20. February 2009 Disclosure Control in Business Data Experiences with.
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
REALCOM Multilevel models for realistically complex data Measurement errors Multilevel Structural equations Multivariate responses at several levels and.
Missing data – issues and extensions For multilevel data we need to impute missing data for variables defined at higher levels We need to have a valid.
MCMC estimation in MlwiN
Multilevel Multivariate Models with responses at several levels Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
SADC Course in Statistics Modelling ideas in general – an appreciation (Session 20)
Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Treatment of missing values
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Prediction and Imputation in ISEE - Tools for more efficient use of combined data sources Li-Chun Zhang, Statistics Norway Svein Nordbotton, University.
Multilevel survival models A paper presented to celebrate Murray Aitkin’s 70 th birthday Harvey Goldstein ( also 70 ) Centre for Multilevel Modelling University.
Efficient modelling of record linked data A missing data perspective Harvey Goldstein Record Linkage Methodology Research Group Institute of Child Health.
Analysis of Complex Survey Data Day 5, Special topics: Developing weights and imputing data.
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
ANOVA: ANalysis Of VAriance. In the general linear model x = μ + σ 2 (Age) + σ 2 (Genotype) + σ 2 (Measurement) + σ 2 (Condition) + σ 2 (ε) Each of the.
Additional Topics in Regression Analysis
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.
Chapter 11 Multiple Regression.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Eurostat Statistical Data Editing and Imputation.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Modelling non-independent random effects in multilevel models Harvey Goldstein and William Browne University of Bristol NCRM LEMMA 3.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Topic (vi): New and Emerging Methods Topic organizer: Maria Garcia (USA) UNECE Work Session on Statistical Data Editing Oslo, Norway, September 2012.
DATA PREPARATION: PROCESSING & MANAGEMENT Lu Ann Aday, Ph.D. The University of Texas School of Public Health.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
Eurostat Statistical Matching using auxiliary information Training Course «Statistical Matching» Rome, 6-8 November 2013 Marco Di Zio Dept. Integration,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Disclosure Limitation in Microdata with Multiple Imputation Jerry Reiter Institute of Statistics and Decision Sciences Duke University.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Sampling and estimation Petter Mostad
Chapter 6 Lecture 3 Sections: 6.4 – 6.5. Sampling Distributions and Estimators What we want to do is find out the sampling distribution of a statistic.
Latent regression models. Where does the probability come from? Why isn’t the model deterministic. Each item tests something unique – We are interested.
Tutorial I: Missing Value Analysis
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
INFO 4470/ILRLE 4470 Visualization Tools and Data Quality John M. Abowd and Lars Vilhuber March 16, 2011.
6. Ordered Choice Models. Ordered Choices Ordered Discrete Outcomes E.g.: Taste test, credit rating, course grade, preference scale Underlying random.
[Part 5] 1/43 Discrete Choice Modeling Ordered Choice Models Discrete Choice Modeling William Greene Stern School of Business New York University 0Introduction.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
Handling Attrition and Non-response in the 1970 British Cohort Study
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
How to handle missing data values
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Presenter: Ting-Ting Chung July 11, 2017
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
Task 6 Statistical Approaches
Homoscedasticity/ Heteroscedasticity In Brief
Probability, Statistics
Missing Data Mechanisms
Non response and missing data in longitudinal surveys
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Homoscedasticity/ Heteroscedasticity In Brief
Clinical prediction models
Chapter 13: Item nonresponse
Presentation transcript:

Non response and missing data in longitudinal surveys

Traditional ways of handling attrition and missing data Weighting typically used for attrition –Sample design and initial non-response provides basic weights –For several waves defines typical pathways and provide weights for each one. e.g. LSYP may require 12 or more –For item non-response use hot deck single imputation

Problems with weighting procedures Inefficient – can only use complete data for each combination of variables analysed Restrictive since weights only provided for chosen pathways Possibly inconsistent results through different weights for different analyses Not very transparent for use Problematic for structurally missing items

Problems with hot deck imputation Not theoretically based Selection of matched cases may not always be possible – especially in multilevel data Single imputation does not allow easy computation of standard errors

Multiple imputation – briefly and simply Consider the model of interest (MOI) We turn this into a multivariate response model and obtain residual estimates of (from an MCMC chain) where x, or y are missing. Use these to fill in and produce a complete data set. Do this (independently) n (e.g. = 20) times. Fit MOI to each data set and combine according to rules to get estimates and standard errors. Note that at imputation stage we can use auxiliary data. Note also that we can handle attrition as missing data.

Omit all records with missing data – inneficient In categorical data use an extra category for missing - biased Plug in the mean over the non-missing values - biased What not to do

Multiple imputation in MLwiN Existing methods assume normality. For multilevel data they cannot handle level 2 variables with missing data Cannot handle discrete variables with missing data. REALCOM-IMPUTE links REALCOM with MLwiN and can handle level 2 and discrete variables. It works by transforming discrete variables to normality using a latent variable model so that all response variables have a joint multivariate normal distribution and then applies MI theory.

Partially observed data values Where we have a prior (estimated) probability distribution (PD) for a missing discrete variable value we simply insert an extra MCMC step that accepts the standard MI value with a probability that is just the probability given by the PD. A corresponding step is used for normal data. This thus uses all of the data efficiently. No data are discarded so long as it is possible to assign a PD. May also reduce partial response bias Several completed data sets are produced and combined as in standard MI These procedures are computationally intensive but once the completed data sets are produced they can be used for many different models – so long as a model uses only variables that have been involved in the imputation procedure.

References Multilevel models with multivariate mixed response types (2009) Goldstein, H, Carpenter, J., Kenward, M., Levin, K. Statistical Modelling (to appear) - Gives methodological background Handling attrition and non-response in longitudinal data. International Journal of longitudinal and Life Course studies. April Discusses issues for longitudinal studies in detail

Sampling weights Consider a 2-level model: Write level 2 weights as Level 1 weights for j-th level 2 unit as Final level 1 weights We use as the level 1 random part explanatory variable instead of the constant =1 This will be used for imputation and for MOI