1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

Slides:

Advertisements

Similar presentations

Handling attrition and nonresponse in longitudinal data Harvey Goldstein University of Bristol.

Advertisements

Treatment of missing values

CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce

Mitigating Risk of Out-of-Specification Results During Stability Testing of Biopharmaceutical Products Jeff Gardner Principal Consultant 36 th Annual Midwest.

Objectives (BPS chapter 24)

Psychology 202b Advanced Psychological Statistics, II April 7, 2011.

Longitudinal Experiments Larry V. Hedges Northwestern University Prepared for the IES Summer Research Training Institute July 28, 2010.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Chapter 10 Simple Regression.

Additional Topics in Regression Analysis

Resampling techniques

Missing Data in Randomized Control Trials

How to deal with missing data: INTRODUCTION

Stat 112: Lecture 9 Notes Homework 3: Due next Thursday

Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.

Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.

Introduction to Multilevel Modeling Using SPSS

Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.

Inference for regression - Simple linear regression

Hypothesis Testing in Linear Regression Analysis

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

CJT 765: Structural Equation Modeling Class 7: fitting a model, fit indices, comparingmodels, statistical power.

 Collecting Quantitative  Data  By: Zainab Aidroos.

Slide 1 Estimating Performance Below the National Level Applying Simulation Methods to TIMSS Fourth Annual IES Research Conference Dan Sherman, Ph.D. American.

Evaluating a Research Report

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.

Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.

G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)

Introduction to Multilevel Modeling Stephen R. Porter Associate Professor Dept. of Educational Leadership and Policy Studies Iowa State University Lagomarcino.

Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.

Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.

SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.

Question paper 1997.

School-level Correlates of Achievement: Linking NAEP, State Assessments, and SASS NAEP State Analysis Project Sami Kitmitto CCSSO National Conference on.

Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.

Lecture 2: Statistical learning primer for biologists

Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director,

Tutorial I: Missing Value Analysis

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.

Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.

Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”

A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.

Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.

DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.

Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.

Stats Methods at IC Lecture 3: Regression.

An Introduction to Latent Curve Models

Missing data: Why you should care about it and what to do about it

Structural Equation Modeling

CJT 765: Structural Equation Modeling

The Centre for Longitudinal Studies Missing Data Strategy

CJT 765: Structural Equation Modeling

Maximum Likelihood & Missing data

Multiple Imputation Using Stata

How to handle missing data values

Probabilistic Models with Latent Variables

The European Statistical Training Programme (ESTP)

Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015

Chapter 13: Item nonresponse

Structural Equation Modeling

Missing data: Is it all the same?

Presentation transcript:

1crmda.KU.edu Todd D. Little University of Kansas Director, Quantitative Training Program Director, Center for Research Methods and Data Analysis Director, Undergraduate Social and Behavioral Sciences Methodology Minor Member, Developmental Psychology Training Program crmda. KU.edu Workshop presented University of Turku, Finland Special Thanks to: Mijke Rhemtulla & Wei Wu On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design

2crmda.KU.edu Road Map Learn about the different types of missing data Learn about ways in which the missing data process can be recovered Understand why imputing missing data is not cheating Learn why NOT imputing missing data is more likely to lead to errors in generalization! Learn about intentionally missing designs

3crmda.KU.edu Key Considerations Recoverability Is it possible to recover what the sufficient statistics would have been if there was no missing data? (sufficient statistics = means, variances, and covariances) Is it possible to recover what the parameter estimates of a model would have been if there was no missing data. Bias Are the sufficient statistics/parameter estimates systematically different than what they would have been had there not been any missing data? Power Do we have the same or similar rates of power (1 – Type II error rate) as we would without missing data?

4crmda.KU.edu Types of Missing Data Missing Completely at Random (MCAR) No association with unobserved variables (selective process) and no association with observed variables Missing at Random (MAR) No association with unobserved variables, but maybe related to observed variables Random in the statistical sense of predictable Non-random (Selective) Missing (MNAR) Some association with unobserved variables and maybe with observed variables

5crmda.KU.edu Effects of imputing missing data

6crmda.KU.edu Effects of imputing missing data No Association with Observed Variable(s) An Association with Observed Variable(s) No Association with Unobserved /Unmeasured Variable(s) MCAR Fully recoverable Fully unbiased MAR Partly to fully recoverable Less biased to unbiased An Association with Unobserved /Unmeasured Variable(s) NMAR Unrecoverable Biased (same bias as not estimating) MAR/NMAR Partly recoverable Same to unbiased

7crmda.KU.edu No Association with ANY Observed Variable An Association with Analyzed Variables An Association with Unanalyzed Variables No Association with Unobserved /Unmeasured Variable(s) MCAR Fully recoverable Fully unbiased MAR Partly to fully recoverable Less biased to unbiased MAR Partly to fully recoverable Less biased to unbiased An Association with Unobserved /Unmeasured Variable(s) NMAR Unrecoverable Biased (same bias as not estimating) MAR/NMAR Partly to fully recoverable Same to unbiased MAR/NMAR Partly to fully recoverable Same to unbiased Effects of imputing missing data Statistical Power: Will always be greater when missing data is imputed!

8crmda.KU.edu Modern Missing Data Analysis In 1978, Rubin proposed Multiple Imputation (MI) An approach especially well suited for use with large public-use databases. First suggested in 1978 and developed more fully in MI primarily uses the Expectation Maximization (EM) algorithm and/or the Markov Chain Monte Carlo (MCMC) algorithm. Beginning in the 1980’s, likelihood approaches developed. Multiple group SEM Full Information Maximum Likelihood (FIML). An approach well suited to more circumscribed models MI or FIML

Figure 7. Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation and 1 PCA auxiliary variable. 60% MAR correlation estimates with 1 PCA auxiliary variable (r =.60) 9 crmda.KU.edu

What goes in the Common Set? Three-form design FormCommon Set X Variable Set AVariable Set BVariable Set C 1¼ of items missing 2¼ of items missing¼ of items 3 missing¼ of items 10crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. 21 questions made up of 7 3-question subtests 11crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Common Set (X) crmda.KU.edu

Three-form design: Example crmda.ku.edu SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Common Set (X) 13

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Set A I have a rich vocabulary. I start conversations. I get stressed out easily. I am always prepared. I am interested in people. 14crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Set B I have excellent ideas. I am the life of the party. I get irritated easily. I like order. I have a soft heart. 15 crmda.KU.edu

Three-form design: Example SubtestItem DemographicsHow old are you? Are you male or female? What is your occupation? Musical TasteWhat is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? OpennessI have a rich vocabulary. I have excellent ideas. I have a vivid imagination. SubtestItem ExtraversionI start conversations. I am the life of the party. I am comfortable around people. NeuroticismI get stressed out easily. I get irritated easily. I have frequent mood swings. ConscientiousnessI am always prepared. I like order. I pay attention to details. AgreeablenessI am interested in people. I have a soft heart. I take time out for others. Set C I have a vivid imagination. I am comfortable around people. I have frequent mood swings. I pay attention to details. I take time out for others. 16crmda.KU.edu

17 Form 1 (XAB)Form 2 (XAC)Form 3 (XBC) How old are you? Are you male or female? What is your occupation? How old are you? Are you male or female? What is your occupation? How old are you? Are you male or female? What is your occupation? What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? What is your favorite genre of music? Do you like to listen to music while you work? Do you prefer music played loud or softly? I have a rich vocabulary. I have excellent ideas. I have a rich vocabulary. I have a vivid imagination. I have excellent ideas. I have a vivid imagination. I start conversations. I am the life of the party. I start conversations. I am comfortable around people. I am the life of the party. I am comfortable around people. I get stressed out easily. I get irritated easily. I get stressed out easily. I have frequent mood swings. I get irritated easily. I have frequent mood swings. I am always prepared. I like order. I am always prepared. I pay attention to details. I like order. I pay attention to details. I am interested in people. I have a soft heart. I am interested in people. I take time out for others. I have a soft heart. I take time out for others.

18crmda.KU.edu

Expansions of 3-Form Design (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu19 FormsItem Set Order1X A B 2X A C 3X B C 4A X B 5A X C 6B X C 7A B X 8A C X 9B C X

Expansions of 3-Form Design (Graham, Taylor, Olchowski, & Cumsille, 2006) crmda.KU.edu20

21 2-Method Planned Missing Design crmda.KU.edu

Self- Report 1 Self- Report 2 COCotinine Smoking Self-Report Bias 2-Method Planned Missing Design 22crmda.KU.edu

2-Method Measurement Expensive Measure 1 Gold standard– highly valid (unbiased) measure of the construct under investigation Problem: Measure 1 is time-consuming and/or costly to collect, so it is not feasible to collect from a large sample Inexpenseive Measure 2 Practical– inexpensive and/or quick to collect on a large sample Problem: Measure 2 is systematically biased so not ideal crmda.KU.edu23

e.g., measuring stress Expensive Measure 1 = collect spit samples, measure cortisol Inexpensive Measure 2 = survey querying stressful thoughts e.g., measuring intelligence Expensive Measure 1 = WAIS IQ scale Inexpensive Measure 2 = multiple choice IQ test e.g., measuring smoking Expensive Measure 1 = carbon monoxide measure Inexpensive Measure 2 = self-report e.g., Student Attention Expensive Measure 1 = Classroom observations Inexpensive Measure 2 = Teacher report 2-Method Measurement crmda.KU.edu24

How it works ALL participants receive Measure 2 (the cheap one) A subset of participants also receive Measure 1 (the gold standard) Using both measures (on a subset of participants) enables us to estimate and remove the bias from the inexpensive measure (for all participants) using a latent variable model 2-Method Measurement crmda.KU.edu25

Example Does child’s level of classroom attention in Grade 1 predict math ability in Grade 3? Attention Measures 1) Direct Classroom Assessment (2 items, N = 60) 2) Teacher Report (2 items, N = 200) Math Ability Measure, 1 item (test score, N = 200) 2-Method Measurement crmda.KU.edu26

Teacher Report 1 (N = 200) Teacher Report 2 (N = 200) Direct Assessment 1 (N = 60) Direct Assessment 2 (N = 60) Attention (Grade 1) Teacher Bias Math Score (Grade 3) Math Score (Grade 3) (N = 200) crmda.KU.edu27

Teacher Report 1 (N = 200) Teacher Report 2 (N = 200) Direct Assessment 1 (N = 60) Direct Assessment 2 (N = 60) Attention (Grade 1) Teacher Bias Math Score (Grade 3) Math Score (Grade 3) (N = 200) crmda.KU.edu28

Teacher Report 1 (N = 200) Teacher Report 2 (N = 200) Direct Assessment 1 (N = 60) Direct Assessment 2 (N = 60) Attention (Grade 1) Teacher Bias Math Score (Grade 3) Math Score (Grade 3) (N = 200) 29 crmda.KU.edu29

Teacher Report 1 (N = 200) Teacher Report 2 (N = 200) Direct Assessment 1 (N = 60) Direct Assessment 2 (N = 60) Attention (Grade 1) Teacher Bias Math Score (Grade 3) Math Score (Grade 3) (N = 200) 30 crmda.KU.edu30

Teacher Report 1 (N = 200) Teacher Report 2 (N = 200) Direct Assessment 1 (N = 60) Direct Assessment 2 (N = 60) Attention (Grade 1) Teacher Bias Math Score (Grade 3) Math Score (Grade 3) (N = 200) crmda.KU.edu31

Teacher Report 1 (N = 200) Teacher Report 2 (N = 200) Direct Assessment 1 (N = 60) Direct Assessment 2 (N = 60) Attention (Grade 1) Teacher Bias Math Score (Grade 3) Math Score (Grade 3) (N = 200) crmda.KU.edu32

2-Method Planned Missing Design 33crmda.KU.edu

2-Method Planned Missing Design 34crmda.KU.edu

Use 2-time point data with variable time-lags to measure a growth trajectory + practice effects (McArdle & Woodcock, 1997) Developmental time-lag model 35crmda.KU.edu

T1T2 Age 1 student Time ;6 5;3 4;9 4;6 4;11 5;7 5;2 5;4 5;7 5;8 4;11 5;0 5;4 5;10 5;3 5;8 36crmda.KU.edu

T0T1 T2T3T4 T5T6 37crmda.KU.edu

T0T1 T2T3T4 T5T6 Intercept crmda.KU.edu

T0T1 T2T3T4 T5T6 39 Intercept Growth 0 Linear growth crmda.KU.edu

T0T1 T2T3T4 T5T6 Intercept Practice 1 Growth Constant Practice Effect crmda.KU.edu

T0T1 T2T3T4 T5T6 Intercept PracticeGrowth Exponential Practice Decline crmda.KU.edu

The Equations for Each Time Point Constant Practice Effect Declining Practice Effect 42crmda.KU.edu

Summary 2 measured time points are formatted according to time-lag This formatting allows a growth-curve to be fit, measuring growth and practice effects Developmental time-lag model 43crmda.KU.edu

K1 2 grade 1 student ;6- 4;11 5;0- 5;5 5;6- 5;11 age 6;0- 6;5 6;6- 6;11 7;0- 7;5 7;6- 7;11 5;6 5;3 4;9 4;6 4;11 5;7 5;2 5;4 6;7 6;0 5;11 5;5 5;9 6;7 6;1 6;5 7;4 6;10 7;3 6;10 7;5 6;4 7;3 7;6 44crmda.KU.edu

4;6- 4;11 5;0- 5;5 5;6- 5;11 age 6;0- 6;5 6;6- 6;11 7;0- 7;5 7;6- 7;11 5;6 5;3 4;9 4;6 4;11 5;7 5;2 5;4 6;7 6;0 5;11 5;5 5;9 6;7 6;1 6;5 7;4 6;10 7;3 6;10 7;5 6;4 7;3 7;6  Out of 3 waves, we create 7 waves of data with high missingness  Allows for more fine- tuned age-specific growth modeling  Even high amounts of missing data are not typically a problem for estimation 45crmda.KU.edu

Growth-Curve Design GroupTime 1Time 2Time 3Time 4Time 5 1xxxxx 2xxxxmissing 3xxx x 4xx xx 5x xxx 6 xxxx 46crmda.KU.edu

Growth Curve Design II GroupTime 1Time 2Time 3Time 4Time 5 1xxxxx 2xxxmissing 3xx x 4x xx 5 xxx 6xx x 7x x x 8 xx x 9x xx 10missingx xx 11missing xxx 47crmda.KU.edu

Growth Curve Design II GroupTime 1Time 2Time 3Time 4Time 5 1xxxxx 2xxxmissing 3xx x 4x xx 5 xxx 6xx x 7x x x 8 xx x 9x xx 10missingx xx 11missing xxx 48crmda.KU.edu

The Impact of Auxiliary Variables Monte Carlo simulation Consider the following Monte Carlo simulation: 60% MAR (i.e., Aux1) missing data 1,000 samples of N = crmda.KU.edu

Excluding A Correlate of Missingness 50 crmda.KU.edu

Simulation Results Showing the Bias Associated with Omitting a Correlate of Missingness. 51 crmda.KU.edu

MNAR improvements 52 crmda.KU.edu

Simulation Results Showing the Bias Reduction Associated with Including Auxiliary Variables in a MNAR Situation. 53 crmda.KU.edu

Simulation results showing the relative power associated with including auxiliary variables in a MCAR Situation. Q Q Q Q Q Improvement in power relative to the power of a model with no auxiliary variables. 54 crmda.KU.edu

PCA Auxiliary Variables Use PCA to reduce the dimensionality of the auxiliary variables in a data set. Use PCA to reduce the dimensionality of the auxiliary variables in a data set. A new smaller set of auxiliary variables are created (e.g., principal components) that contain all the useful information (both linear and non-linear) in the original data set. These principal component scores are then used to inform the missing data handling procedure (i.e., FIML, MI). These principal component scores are then used to inform the missing data handling procedure (i.e., FIML, MI) crmda.KU.edu

The Use of PCA Auxiliary Variables Consider a series of simulations: MCAR, MAR, MNAR (10-60%) missing data 1,000 samples of N = crmda.KU.edu

60% MAR correlation estimates with no auxiliary variables Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation. 57 crmda.KU.edu

Bias – Linear MAR process ρ Aux,Y =.60; 60% MAR 58 crmda.KU.edu

Non-Linear Missingness 59 crmda.KU.edu

Bias – Non-Linear MAR process ρ Aux,Y =.60; 60% non-linear MAR 60 crmda.KU.edu

Bias ρ Aux,Y =.60; 60% MAR 61 crmda.KU.edu

Bias ρ Aux,Y =.60; 60% MAR 62 crmda.KU.edu

Bias ρ Aux,Y =.60; 60% MAR 63 crmda.KU.edu

60% MAR correlation estimates with no auxiliary variables Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation. 64 crmda.KU.edu

60% MAR correlation estimates with all possible auxiliary variables (r =.60) Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation and 8 auxiliary variables. 65 crmda.KU.edu

Simulation results showing XY correlation estimates (with 95 and 99% confidence intervals) associated with a 60% MAR Situation and 1 PCA auxiliary variable. 60% MAR correlation estimates with 1 PCA auxiliary variable (r =.60) 66 crmda.KU.edu

Auxiliary Variable Power Comparison 1 PCA Auxiliary 1 Auxiliary All 8 Auxiliary Variables 67 crmda.KU.edu

Faster and more reliable convergence 68 crmda.KU.edu

Summary Including principal component auxiliary variables in the imputation model improves parameter estimation compared to the absence of auxiliary variables and beyond the improvement of typical auxiliary variables in most cases, particularly with the non- linear MAR type of missingness. Improve missing data handling procedures when the number of potential auxiliary variables is beyond a practical limit crmda.KU.edu

70 crmda.KU.edu

71 Generate multiply imputed datasets (m). Calculate a single covariance matrix on all N*m observations. By combining information from all m datasets, this matrix should represent the best estimate of the population associations. Run the Analysis model on this single covariance matrix and use the resulting estimates as the basis for inference and hypothesis testing. The fit function from this approach should be the best basis for making inferences about model fit and significance. Using a Monte Carlo Simulation, we test the hypothesis that this approach is reasonable. Simple Significance Testing with MI crmda.KU.edu

Population Model A2 Factor A.6 1 A10A crmda.KU.edu....6 B2 Factor B.6 3 B10 B Covariate1 Covariate

Analysis Model A2 λ 1,1 1* A10A1 θ 1,1 ψ 2,1 (0*) † 73crmda.KU.edu... λ 2, 1 λ 10, 1 B2 λ 11,2 B10 B1... λ 12, 2 λ 20, 2 θ 2,2 θ 10,10 θ 20,20 θ 11,11 θ 12,12 1* Factor A Factor B †: ψ 2,1 is fixed to zero to parameterize the alternative model.

74 Results: Change in Chi-squared Test Percent Relative Bias NPMSuperMatrixNaive crmda.KU.edu Note: The x-axis of the figure above represents levels of PM nested within sample size.

75crmda.KU.edu Thanks for your attention! Questions? crmda. KU.edu Workshop presented University of Turku, Finland On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design On the Merits of Planning and Planning for Missing Data* *You’re a fool for not using planned missing data design

Update Dr. Todd Little is currently at Texas Tech University Director, Institute for Measurement, Methodology, Analysis and Policy (IMMAP) Director, “Stats Camp” Professor, Educational Psychology and Leadership IMMAP (immap.educ.ttu.edu) Stats Camp (Statscamp.org) 76www.Quant.KU.edu