Multiple Imputation using SAS Don Miller 812 Oswald Tower

Slides:



Advertisements
Similar presentations
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
Advertisements

19.Multivariate Analysis Using NLTS2 Data. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Simple Logistic Regression
Latent Growth Curve Modeling In Mplus:
Adapting to missing data
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Additional Topics in Regression Analysis
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
Multiple Imputation Stata (ice) How and when to use it.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
How to deal with missing data: INTRODUCTION
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
GEE and Generalized Linear Mixed Models
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
EIPB 698E Lecture 10 Raul Cruz-Cano Fall Comments for future evaluations Include only output used for conclusions Mention p-values explicitly (also.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
Regression in SAS Caitlin Phelps. Importing Data  Proc Import:  Read in variables in data set  May need some options incase SAS doesn’t guess the format.
BUSI 6480 Lecture 8 Repeated Measures.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Tutorial I: Missing Value Analysis
Multiple Imputation using SAS Don Miller 812 Oswald Tower
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Veronica Burt. Reading in the Data options nofmterr; data BCSC.data1; set BCSC.Dr238bs_sum_data_deid_v3_1012; run;
Multiple Imputation Multiple Regression. Input From SPSS *** Mult-Imput_M-Reg.sas ***; PROC IMPORT OUT= WORK.IntroQuest DATAFILE= "C:\Users\Vati\Documents\StatData\IntroQ\IntroQ.sav"
Bobby L. Jones, PhD Carnegie Mellon University
Analysis of matched data Analysis of matched data.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Weighting and imputation PHC 6716 July 13, 2011 Chris McCarty.
Best Practices for Handling Missing Data
LINEAR REGRESSION 1.
Applied Business Forecasting and Regression Analysis
Notes on Logistic Regression
William Greene Stern School of Business New York University
Generalized Linear Models
David L. Olson Department of Management University of Nebraska
This Week Review of estimation and hypothesis testing
Linear Mixed Models in JMP Pro
Beyond the general linear model: Using a mixed modeling approach to test the effects of “gender blind” selection policies Amber K. Lupo, M.A. University.
Introduction to Survey Data Analysis
Using Weights in the Analysis of Survey Data
G Lecture 6 Multilevel Notation; Level 1 and Level 2 Equations
Soc 3306a: ANOVA and Regression Models
6-1 Introduction To Empirical Models
Presenter: Ting-Ting Chung July 11, 2017
Soc 3306a Lecture 11: Multivariate 4
Introduction to Logistic Regression
Producing Descriptive Statistics
Using Weights in the Analysis of Survey Data
Non response and missing data in longitudinal surveys
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Multiple Imputation using SAS Don Miller 812 Oswald Tower

Introduction Missing values occur often in research: refused/don’t know, attrition, skip patterns… Dropping missing values may bias results (e.g. women and/or overweight tend to disclose their weight less often than others) Attempts are made to impute the data (“fill in” missing values) Single imputation (e.g. with the mean) is biased, doesn’t give measure of uncertainty

Multiple Imputation Simple Procedure For categorical variables: Construct binary dummy variables, throwing out reference category (e.g. Race: 1=“white”, 2=“black”, 3=“other” becomes Black, Other variables) Impute using PROC MI Round off imputed dummies if you want plausible values (this will bias your results) Do analysis: PROC REG, LOGISTIC, etc. using by _imputation_; in procedure Combine results using PROC MIANALYZE

PROC MI Typical syntax: proc mi data=rawdat seed= out=impdat; var sex black other age drivesfast; run; data= 1 copy of data with missing values out= 5 copies of data with imputed values (will be different across copies) seed= random seed, you can keep same to reconstruct your results var Variables with missing values you need imputed, in model, and those that may be helpful with imputation

PROC MI Sample Output

PROC MI Options nimpute=5 # imputations, default=5 0 gives missing patterns minimum= set min & max, sometimes maximum= doesn’t converge as well round= round off option alpha=0.05 confidence limits mu0= t test null hypothesis μ=μ 0

PROC MI Statements em maxiter=200 out=emdata; EM algorithm, MLE of missing data freq fweight; weighs observations by frequency weight mcmc (options); modify imputation method class sex race; specify categorical variables (don’t need dummies) (new / experimental)

Regression Fit your model as if data had no missing values, using by _imputation_; proc reg data=impdat outest=parmcov covout; model drivesfast=sex black other age; by _imputation_; run; You’ll get nimpute (usually 5) sets of output Estimates, covariances, errors will be combined in MIANALYZE (R² is just mean) Need to generate parameter estimates and covariance data set (varies by procedure)

Parameter Est. & Covariance Matrix proc logistic data=impdat descending; model drivesfast=sex black other age /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc mixed data=impdat; model drivesfast=sex black other age /solution covb; by _imputation_; ods output covparms=parmcov; run;

Parameter Est. & Covariance Matrix proc genmod data=impdat; model drivesfast=sex black other age /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc glm data=impdat; model drivesfast=sex black other age /inverse; by _imputation_; ods output ParameterEstimates=parmsdat InvXPX=xpxidat; run;

PROC MIANALYZE Syntax depends on what procedure you used in previous step: proc mianalyze data=parmcov; or proc mianalyze parms=parmsdat covb=covbdat; or proc mianalyze parms=parmsdat xpxi=xpxidat; modeleffects intercept sex black other age; run; Note the “var” statement is now “modeleffects” Note that the dependent variable is omitted

PROC MIANALYZE Output