Multiple Imputation using SAS Don Miller 812 Oswald Tower 814-863-3155.

Slides:



Advertisements
Similar presentations
Technology Short Courses: Spring 2010 Kentaka Aruga
Advertisements

16b. Accessing Data: Means in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Logistic Regression I Outline Introduction to maximum likelihood estimation (MLE) Introduction to Generalized Linear Models The simplest logistic regression.
19.Multivariate Analysis Using NLTS2 Data. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Simple Logistic Regression
SAS Programming: Working With Variables. Data Step Manipulations New variables should be created during a Data step Existing variables should be manipulated.
Latent Growth Curve Modeling In Mplus:
© 2009, KAISER PERMANENTE CENTER FOR HEALTH RESEARCH Getting to Know Your Data Basic Data Cleaning Principles.
Adapting to missing data
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Additional Topics in Regression Analysis
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy ENAR March 2009.
Multiple Imputation Stata (ice) How and when to use it.
Logistic Regression Biostatistics 510 March 15, 2007 Vanessa Perez.
OLS versus MLE Example YX Here is the data:
How to deal with missing data: INTRODUCTION
Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.
15b. Accessing Data: Frequencies in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
Logistic Regression II Simple 2x2 Table (courtesy Hosmer and Lemeshow) Exposure=1Exposure=0 Disease = 1 Disease = 0.
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Introduction to SAS/Graph Don Miller 812 Oswald Tower
The Mimix Command Reference Based Multiple Imputation For Sensitivity Analysis of Longitudinal Trials with Protocol Deviation Suzie Cro EMERGE.
18b. PROC SURVEY Procedures in SAS ®. 1 Prerequisites Recommended modules to complete before viewing this module  1. Introduction to the NLTS2 Training.
Tailored Products Group Analysis Silver Chung Marshall Shen.
April 6 Logistic Regression –Estimating probability based on logistic model –Testing differences among multiple groups –Assumptions for model.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
2 December 2004PubH8420: Parametric Regression Models Slide 1 Applications - SAS Parametric Regression in SAS –PROC LIFEREG –PROC GENMOD –PROC LOGISTIC.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Grant Brown.  AIDS patients – compliance with treatment  Binary response – complied or no  Attempt to find factors associated with better compliance.
Using Weighted Data Donald Miller Population Research Institute 812 Oswald Tower, December 2008.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Introduction to Multiple Imputation CFDR Workshop Series Spring 2008.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Haas MFE SAS Workshop Lecture 3: Peng Liu Haas School.
Regression in SAS Caitlin Phelps. Importing Data  Proc Import:  Read in variables in data set  May need some options incase SAS doesn’t guess the format.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
March 28, 30 Return exam Analyses of covariance 2-way ANOVA Analyses of binary outcomes.
BUSI 6480 Lecture 8 Repeated Measures.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
1 STA 617 – Chp10 Models for matched pairs Summary  Describing categorical random variable – chapter 1  Poisson for count data  Binomial for binary.
1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,
Lesson 8 - Topics Creating SAS datasets from procedures Using ODS and data steps to make reports Using PROC RANK Programs in course notes LSB 4:11;5:3.
Topic 20: Single Factor Analysis of Variance. Outline Analysis of Variance –One set of treatments (i.e., single factor) Cell means model Factor effects.
Applied Epidemiologic Analysis - P8400 Fall 2002 Labs 6 & 7 Case-Control Analysis ----Logistic Regression Henian Chen, M.D., Ph.D.
Tutorial I: Missing Value Analysis
Dependent Variable Discrete  2 values – binomial  3 or more discrete values – multinomial  Skewed – e.g. Poisson Continuous  Non-normal.
Chapter 8: Using Basic Statistical Procedures “33⅓% of the mice used in the experiment were cured by the test drug; 33⅓% of the test population were unaffected.
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Veronica Burt. Reading in the Data options nofmterr; data BCSC.data1; set BCSC.Dr238bs_sum_data_deid_v3_1012; run;
Multiple Imputation Multiple Regression. Input From SPSS *** Mult-Imput_M-Reg.sas ***; PROC IMPORT OUT= WORK.IntroQuest DATAFILE= "C:\Users\Vati\Documents\StatData\IntroQ\IntroQ.sav"
Bobby L. Jones, PhD Carnegie Mellon University
Analysis of matched data Analysis of matched data.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
SAS ® 101 Based on Learning SAS by Example: A Programmer’s Guide Chapters 16 & 17 By Tasha Chapman, Oregon Health Authority.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Multiple Imputation using SAS Don Miller 812 Oswald Tower
LINEAR REGRESSION 1.
Notes on Logistic Regression
Generalized Linear Models
6-1 Introduction To Empirical Models
Presenter: Ting-Ting Chung July 11, 2017
Producing Descriptive Statistics
Non response and missing data in longitudinal surveys
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Multiple Imputation using SAS Don Miller 812 Oswald Tower

Introduction Missing values occur often in research: refused/don’t know, attrition, skip patterns… Dropping missing values may bias results (e.g. women and/or overweight tend to disclose their weight less often than others) Attempts are made to impute the data (“fill in” missing values) Single imputation (e.g. with the mean) is biased, doesn’t give measure of uncertainty

Paris datasets Open Windows Explorer (or My Computer) Tools – Map Network Drive Drive P: Folder \\paris\sas_data\\paris\sas_data For help Stat help

Data Setup

Multiple Imputation Simple Procedure 1. Impute using PROC MI 2. Round off, if you want plausible values (caution: this will bias your results) 3. Do analysis: PROC REG, LOGISTIC, etc. using by _imputation_; in the procedure 4. Combine results using PROC MIANALYZE For categorical variables: Construct binary dummy variables, throwing out reference category (e.g. race: 1=“white”, 2=“black”, 3=“other” becomes black, other variables)

PROC MI Typical syntax: proc mi data=bmx out=impdat seed=33155; var bmxbmi bmxht bmxwt bmxarmc bmxarml; run; data= 1 copy of data with missing values out= 5 copies of data with imputed values (will be different across copies) seed= random seed, you can keep same to reconstruct your results var Variables with missing values you need imputed, in model, and those that may be helpful with imputation

PROC MI Sample Output

PROC MI Options nimpute=5 # imputations, default=5 0 gives missing patterns minimum= set min & max, sometimes maximum= doesn’t converge as well round= round off option alpha=0.05 confidence limits mu0= t test null hypothesis μ=μ 0

PROC MI Statements em maxiter=200 out=emdata; EM algorithm, MLE of missing data freq fweight; weighs observations by frequency weight mcmc (options); modify imputation method class sex race; specify categorical variables (don’t need dummies) (new / experimental)

Output dataset

Regression Fit your model as if data had no missing values, using by _imputation_; proc reg data=impdat outest=parmcov covout; model bmxbmi=bmxht bmxwt bmxarmc bmxarml; by _imputation_; run; You’ll get nimpute (usually 5) sets of output Estimates, covariances, errors will be combined in MIANALYZE (R² is just mean) Need to generate parameter estimates and covariance data set (varies by procedure)

Parameter Est. & Covariance Matrix proc logistic data=impdat descending; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc mixed data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /solution covb; by _imputation_; ods output covparms=parmcov; run;

Parameter Est. & Covariance Matrix proc genmod data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /covb; by _imputation_; ods output ParameterEstimates=parmsdat CovB=covbdat; run; proc glm data=impdat; model bmxbmi=bmxht bmxwt bmxarmc bmxarml /inverse; by _imputation_; ods output ParameterEstimates=parmsdat InvXPX=xpxidat; run;

PROC MIANALYZE Syntax depends on what procedure you used in previous step: proc mianalyze data=parmcov; (or) proc mianalyze parms=parmsdat covb=covbdat; (or) proc mianalyze parms=parmsdat xpxi=xpxidat; (then type this:) modeleffects intercept bmxht bmxwt bmxarmc bmxarml; run; Note the “var” statement is now “modeleffects” Note that the dependent variable is omitted

PROC MIANALYZE Output