Presenter: Ting-Ting Chung July 11, 2017

Slides:



Advertisements
Similar presentations
Handling attrition and non- response in longitudinal data Harvey Goldstein University of Bristol.
Advertisements

Non response and missing data in longitudinal surveys.
Treatment of missing values
Some birds, a cool cat and a wolf
Adapting to missing data
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.

Missing Data in Randomized Control Trials
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Robert Voogt Dutch Ministery Of Social Affairs and Employment (formerly of the University Of Amsterdam) Nonresponse in survey research: why is it a problem?
FINAL REPORT: OUTLINE & OVERVIEW OF SURVEY ERRORS
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
PEAS wprkshop 2 Non-response and what to do about it Gillian Raab Professor of Applied Statistics Napier University.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
BUSI 6480 Lecture 8 Repeated Measures.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
29 th TRF 2003, Denver July 14 th, Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University.
Tutorial I: Missing Value Analysis
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Multiple Imputation Multiple Regression. Input From SPSS *** Mult-Imput_M-Reg.sas ***; PROC IMPORT OUT= WORK.IntroQuest DATAFILE= "C:\Users\Vati\Documents\StatData\IntroQ\IntroQ.sav"
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Stats Methods at IC Lecture 3: Regression.
Best Practices for Handling Missing Data
Sampling and Sampling Distributions
Missing data: Why you should care about it and what to do about it
Marketing Research Aaker, Kumar, Leone and Day Eleventh Edition
Handling Attrition and Non-response in the 1970 British Cohort Study
MISSING DATA AND DROPOUT
Modeling approaches for the allocation of costs
The Centre for Longitudinal Studies Missing Data Strategy
Maximum Likelihood & Missing data
Introduction to Survey Data Analysis
Multiple Imputation.
Multiple Imputation Using Stata
How to handle missing data values
Dealing with missing data
An Online CNN Poll.
Discrete Event Simulation - 4
The bane of data analysis
Missing Data Imputation in the Bayesian Framework
The European Statistical Training Programme (ESTP)
EM for Inference in MV Data
Missing Data Mechanisms
Non response and missing data in longitudinal surveys
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
EM for Inference in MV Data
Chapter 4: Missing data mechanisms
The European Statistical Training Programme (ESTP)
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Chapter 13: Item nonresponse
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Presenter: Ting-Ting Chung July 11, 2017 Multiple Imputation Presenter: Ting-Ting Chung July 11, 2017

Introduction Missing data is a pervasive and persistent problem in many data sets Representative? Deliberately results in missing data (questions asked only of women), refusal to answer (sensitive questions), insufficient knowledge (month of first words spoken), and attrition due to death or loss of contact with respondents in longitudinal surveys

Introduction Analysts often make assumptions about the nature of missing data including categorizations Missing at Random (MAR) The probability of missing data on Y is unrelated to the value of Y after controlling for other variables in the analysis (say X) Missing Completely at Random (MCAR) Suppose variable Y has some missing values. We will say that these values are MCAR if the probability of missing data on Y is unrelated to the value of Y itself or to the values of any other variable in the data set. However, it does allow for the possibility that “missingness” on Y is related to the “missingness” on some other variable X Not Missing at Random (NMAR) Missing values do depend on unobserved values

Type of missing We can distinguish between two main patterns of missingness Missing monotone missing data is a pattern in which the missing data exists at the end (reading from left to right) of the data record with no gaps between full and missing data Missing arbitrarily missing data pattern that has missingness interspersed among full data

Methods for handling missing data Conventional methods Listwise deletion (or complete case analysis) If a case has missing data for any of the variables, then simply exclude that case from the analysis Imputation methods Substitute each missing value for a reasonable guess, and then carry out the analysis as if there were not missing values Advanced Methods Multiple Imputation It replaces each missing item with two or more acceptable values, representing a distribution of possibilities Maximum Likelihood Bayesian simulation methods Hot deck imputation methods

Steps of Multiple Imputation Running an imputation model defined by the chosen variables to create imputed data sets. In other words, the missing values are filled in m times to generate m complete data sets. m=20 is considered good enough. Correct model choices require considering: Firstly, we should identify which are the variables with missing values Secondly, we should compute the proportion of missing values for each variable Thirdly, we should assess whether different missing value patterns exist in the data, and try to understand the nature of the missing values The m complete data sets are analyzed by using standard procedures The parameter estimates from each imputed data set are combined to get a final set of parameter estimates

Example MONOTONE REGRESSION IMPUTATION WITH MISSING DATA ON CONTINUOUS VARIABLE AND CATEGORICAL COVARIATES Step 1. proc mi data=one nimpute=0 ; var hhinc age sex ed4cat region weight ; run ; PROC MI with a NIMPUTE=0 option to initially examine the existing missing data pattern shows that 1.79% of the n of 5692 have missing data on the WEIGHT variable

Example MONOTONE REGRESSION IMPUTATION WITH MISSING DATA ON CONTINUOUS VARIABLE AND CATEGORICAL COVARIATES Step 2. proc mi data=one nimpute=5 out=outimputex2 seed=20102 ; class region ed4cat sex ; monotone regression ; var hhinc age region sex ed4cat weight ; run ; The MONOTONE REGRESSION statement requests the use of the regression method for imputation of the continuous variable WEIGHT. The regression method is recommended for imputation of a continuous variable with a monotone missing data pattern and categorical covariates contributing to the imputation step

Example MONOTONE REGRESSION IMPUTATION WITH MISSING DATA ON CONTINUOUS VARIABLE AND CATEGORICAL COVARIATES RE >0.9good

Example MONOTONE REGRESSION IMPUTATION WITH MISSING DATA ON CONTINUOUS VARIABLE AND CATEGORICAL COVARIATES Step 3. proc surveyreg data=outimputex2 ; strata sestrat ; cluster seclustr ; weight ncsrwtlg ; class region sex ed4cat ; domain _imputation_ ; model weight=hhinc age region sex ed4cat / solution ; ods output ParameterEstimates=outregex2 (where=(_imputation_ ne . )); run ;

Example MONOTONE REGRESSION IMPUTATION WITH MISSING DATA ON CONTINUOUS VARIABLE AND CATEGORICAL COVARIATES Step 4. proc mianalyze parms=outregex2; modeleffects intercept hhinc age region1 region2 region3 sex1 ed4cat1 ed4cat2 ed4cat3 ; run;

Example MONOTONE REGRESSION IMPUTATION WITH MISSING DATA ON CONTINUOUS VARIABLE AND CATEGORICAL COVARIATES Table 2.5 includes the synthesized output for the 5 imputed data sets with the complex sample adjusted standard errors (Step 2 using PROC SURVEYREG) and the variance further adjusted by PROC MIANALYZE. The parameter estimates and other statistics are mean values from the 5 imputed values and are now adjusted correctly to account for the multiple imputation and the complex survey features.

Example 2 proc mianalyze parms=lgparms proc logistic data=OutFish; covb=lgcovb; modeleffects Intercept Length; run; proc mi data=Fish seed=1305417 out=OutFish; class Species; monotone reg(Height Width/ details) logistic( Species= Length Height Width Height*Width/ details); var Length Height Width Species; run; proc logistic data=OutFish; class Species; model Species= Length / covb; by _Imputation_; ods output ParameterEstimates=lgparms CovB=lgcovb; run;