Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.

Slides:



Advertisements
Similar presentations
Continued Psy 524 Ainsworth
Advertisements

Latent normal models for missing data Harvey Goldstein Centre for Multilevel Modelling University of Bristol.
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
GENERAL LINEAR MODELS: Estimation algorithms
Data: Crab mating patterns Data: Typists (Poisson with random effects) (Poisson Regression, ZIP model, Negative Binomial) Data: Challenger (Binomial with.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Analysis of variance (ANOVA)-the General Linear Model (GLM)
PROC GLIMMIX: AN OVERVIEW
Latent Growth Curve Modeling In Mplus:
Binary Response Lecture 22 Lecture 22.

Multiple Imputation Stata (ice) How and when to use it.
OLS versus MLE Example YX Here is the data:
How to deal with missing data: INTRODUCTION
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
Multivariate Probability Distributions. Multivariate Random Variables In many settings, we are interested in 2 or more characteristics observed in experiments.
Generalized Linear Models
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
“Analyzing Health Equity Using Household Survey Data” Owen O’Donnell, Eddy van Doorslaer, Adam Wagstaff and Magnus Lindelow, The World Bank, Washington.
Workshop on methods for studying cancer patient survival with application in Stata Karolinska Institute, 6 th September 2007 Modeling relative survival.
1 Experimental Statistics - week 4 Chapter 8: 1-factor ANOVA models Using SAS.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
1 Multiple Imputation : Handling Interactions Michael Spratt.
1 Experimental Statistics - week 2 Review: 2-sample t-tests paired t-tests Thursday: Meet in 15 Clements!! Bring Cody and Smith book.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Logistic Regression Analysis of Matched Case-Control Data- Part 2.
Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Multilevel Modeling Software Wayne Osgood Crime, Law & Justice Program Department of Sociology.
A new architecture for handling multiply imputed data in Stata JC Galati 1, JB Carlin 1,2, P Royston 3 1 Murdoch Childrens Research Institute (MCRI), Melbourne.
Forecasting Choices. Types of Variable Variable Quantitative Qualitative Continuous Discrete (counting) Ordinal Nominal.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
BUSI 6480 Lecture 8 Repeated Measures.
© John M. Abowd 2007, all rights reserved General Methods for Missing Data John M. Abowd March 2007.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
29 th TRF 2003, Denver July 14 th, Jenny H. Qin and Mike Singleton Kentucky CODES Kentucky Injury Prevention & Research Center University.
Analysis of Experiments
Tutorial I: Missing Value Analysis
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Armando Teixeira-Pinto AcademyHealth, Orlando ‘07 Analysis of Non-commensurate Outcomes.
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Multiple Imputation Multiple Regression. Input From SPSS *** Mult-Imput_M-Reg.sas ***; PROC IMPORT OUT= WORK.IntroQuest DATAFILE= "C:\Users\Vati\Documents\StatData\IntroQ\IntroQ.sav"
Speed Dating with Regression Procedures
7/14/2003(c) 2003 Strategic Matching, Inc.1 29 th International Traffic Records Forum Using Multiple Imputation to Resolve Missing Data Issues.
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Missing data: Why you should care about it and what to do about it
LINEAR REGRESSION 1.
B&A ; and REGRESSION - ANCOVA B&A ; and
Generalized Linear Models
Linear Mixed Models in JMP Pro
CH 5: Multivariate Methods
Maximum Likelihood & Missing data
Introduction to logistic regression a.k.a. Varbrul
Multiple Imputation Using Stata
How to handle missing data values
Carlo Azzarri, Chris Gray
Presenter: Ting-Ting Chung July 11, 2017
Stata 9, Summing up.
When the Mean isn’t Enough
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Clinical prediction models
Presentation transcript:

Introduction to Multiple Imputation CFDR Workshop Series Spring 2008

2 Outline Missing data mechanisms What is Multiple Imputation? SAS Proc MI, Proc MIANALYZE Stata ICE, MICOMBINE SAS IVEware What’s the diff? Problems with categorical imputation

3 Missing data mechanisms Missing Completely At Random (MCAR) –The probability of missingness doesn't depend on anything. Missing At Random (MAR) –The probability of missingness does not depend on the unobserved value of the missing variable, but it can depend on any of the other variables in your dataset Not Missing at Random (NMAR) –The probability of missingness depends on the unobserved value of the missing variable itself

4

5 What is Multiple Imputation? 1.Imputation Make M=3 to 10 copies of incomplete data set filling in with conditionally random values 2.Analyses Of each data set separately 3.Pooling Point estimates. Average across M analyses Standard errors. Combine variances.

6 1. Imputation: Multiple Copies of Dataset

7 Three steps 1.Imputation Make M=2 to 10 copies of incomplete data set filling in with conditionally random values 2.Analyses Of each data set separately 3.Pooling Point estimates. Average across M analyses Standard errors. Combine variances.

8 What is MI? STATA –based on each conditional density –chained equations SAS –joint distribution of all the variables –assumed multivariate normal distribution SAS IVEware –same as Stata, more options.

9 Stata Example ICE to impute –Regression commands may be logistic, mlogit, ologit, or regress. MICOMBINE to analyze and combine the results. –Supported regression cmds are clogit, cnreg, glm, logistic, logit, mlogit, ologit, oprobit, poisson, probit, qreg, regress, rreg, stcox, streg, or xtgee. Easy to use, nice documentation

10 SAS example

11 Step 1: Proc MI Typical syntax: proc mi data=mi_example out=outmi seed=1234; var Oxygen RunTime RunPulse; run;

12 Step 2: Run Models proc reg data=outmi outest=outreg covout noprint; model Oxygen = RunTime RUnPulse; by _Imputation_; run; Note that the regression output is stored as dataset “outreg” Proc’s= Reg, Logistic, Genmod, Mixed, GLM

13 Parameter Estimates & Covariance Matrices proc print data=outreg(obs=8); var _Imputation_ _Type_ _Name_ Intercept RunTime RunPulse; run;

14 Step 3. Proc Mianalyze proc mianalyze data=outreg; modeleffects Intercept RunTime RunPulse; run;

15 Irritating Parameter Est. & Covariance Matrices Syntax depends on what procedure you used in previous step: proc mianalyze data=parmcov; (or) proc mianalyze parms=parmsdat covb=covbdat; (or) proc mianalyze parms=parmsdat xpxi=xpxidat; PROC’s: reg, genmod, logit, mixed, glm.

16 SAS IVEware: 4 Components 1. IMPUTE -- nice options. 2. DESCRIBE estimates the population means, proportions, subgroup differences, contrasts and linear combinations of means and proportions. A Taylor Series approach is used to obtain variance estimates appropriate for a user specified complex sample design. 3. REGRESS fits linear, logistic, polytomous, Poisson, Tobit and proportional hazard regression models for data resulting from a complex sample design. 4. SASMOD allows users to take into account complex sample design features when analyzing data with several SAS procedures. SAS PROCS can be called:CALIS, CATMOD, GENMOD, LIFEREG, MIXED, NLIN, PHREG, and PROBIT.

17 IVEware Impute IMPUTE assumes the variables in the data set are one of the following five types: (1) continuous (2) binary (3) categorical (polytomous with more than two categories) (4) counts (5) mixed The types of regression models used are linear, logistic, Poisson, generalized logit or mixed logistic/linear, depending on the type of variable being imputed.

18 SAS IVEware: 4 Components 1. IMPUTE -- nice options. 2. DESCRIBE estimates the population means, proportions, subgroup differences, contrasts and linear combinations of means and proportions. A Taylor Series approach is used to obtain variance estimates appropriate for a user specified complex sample design. 3. REGRESS fits linear, logistic, polytomous, Poisson, Tobit and proportional hazard regression models for data resulting from a complex sample design. 4. SASMOD allows users to take into account complex sample design features when analyzing data with several SAS procedures. SAS PROCS can be called:CALIS, CATMOD, GENMOD, LIFEREG, MIXED, NLIN, PHREG, and PROBIT.

19 A Few Issues Do I impute the dependent variable? Which model has more information? The imputation model or the analyst model? How many imputations do I need to do? Can I impute in one language and analyze in another? How do I get summary statistics such as R squared? Can I do this in SPSS? Where do I go with questions?

20 Thanks Next up: “COLLATERAL CONSEQUENCES OF VIOLENCE IN DISADVANTAGED NEIGHBORHOODS” Dr. David Harding Wednesday, February 13, Noon - 1:00 pm Accessing and Analyzing Add Health Data Instructor: Dr. Meredith Porter Monday, February 25, 12:00-1:00 pm