Part 2 Attrition: Bias and Loss of Power. Relevant Papers Graham, J.W., (2009). Missing data analysis: making it work in the real world. Annual Review.

Slides:



Advertisements
Similar presentations
Treatment of missing values
Advertisements

Topic 12 – Further Topics in ANOVA
ANALYZING MORE GENERAL SITUATIONS UNIT 3. Unit Overview  In the first unit we explored tests of significance, confidence intervals, generalization, and.
 Overview  Types of Missing Data  Strategies for Handling Missing Data  Software Applications and Examples.
Statistics 100 Lecture Set 7. Chapters 13 and 14 in this lecture set Please read these, you are responsible for all material Will be doing chapters
Some birds, a cool cat and a wolf
Slides to accompany Weathington, Cunningham & Pittenger (2010), Chapter 4: An Overview of Empirical Methods 1.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Missing Data: Analysis and Design John W. Graham The Prevention Research Center and Department of Biobehavioral Health Penn State University.
Psychology 202b Advanced Psychological Statistics, II February 1, 2011.
How to Handle Missing Values in Multivariate Data By Jeff McNeal & Marlen Roberts 1.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Experimental Design, Statistical Analysis CSCI 4800/6800 University of Georgia Spring 2007 Eileen Kraemer.
Power Analysis for Correlation & Multiple Regression Sample Size & multiple regression Subject-to-variable ratios Stability of correlation values Useful.
Multiple Regression Models: Some Details & Surprises Review of raw & standardized models Differences between r, b & β Bivariate & Multivariate patterns.
Experimental Control cont. Psych 231: Research Methods in Psychology.
Missing Data in Randomized Control Trials
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter 7 Multicollinearity. What is in this Chapter? In Chapter 4 we stated that one of the assumptions in the basic regression model is that the explanatory.
Linear Regression and Correlation Analysis
Chapter 13 Introduction to Linear Regression and Correlation Analysis
How to deal with missing data: INTRODUCTION
Chapter 9 Multicollinearity
Modeling Achievement Trajectories When Attrition is Informative Betsy J. Feldman & Sophia Rabe- Hesketh.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 6 Chicago School of Professional Psychology.
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Missing Data in Randomized Control Trials
Multiple imputation using ICE: A simulation study on a binary response Jochen Hardt Kai Görgen 6 th German Stata Meeting, Berlin June, 27 th 2008 Göteborg.
Chapter 8 Experimental Design: Dependent Groups and Mixed Groups Designs.
1 Multiple Imputation : Handling Interactions Michael Spratt.
Statistics for the Social Sciences Psychology 340 Fall 2013 Correlation and Regression.
Extension to Multiple Regression. Simple regression With simple regression, we have a single predictor and outcome, and in general things are straightforward.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
1 Psych 5500/6500 t Test for Two Independent Means Fall, 2008.
Chapter 9 Analyzing Data Multiple Variables. Basic Directions Review page 180 for basic directions on which way to proceed with your analysis Provides.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Applied Epidemiologic Analysis - P8400 Fall 2002 Lab 10 Missing Data Henian Chen, M.D., Ph.D.
Imputation for Multi Care Data Naren Meadem. Introduction What is certain in life? –Death –Taxes What is certain in research? –Measurement error –Missing.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
Chapter 10 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 A perfect correlation implies the ability to predict one score from another perfectly.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
Education 795 Class Notes P-Values, Partial Correlation, Multi-Collinearity Note set 4.
The Impact of Missing Data on the Detection of Nonuniform Differential Item Functioning W. Holmes Finch.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Missing Values Raymond Kim Pink Preechavanichwong Andrew Wendel October 27, 2015.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
Special Topics in Educational Data Mining HUDK5199 Spring term, 2013 March 13, 2013.
Tutorial I: Missing Value Analysis
Pre-Processing & Item Analysis DeShon Pre-Processing Method of Pre-processing depends on the type of measurement instrument used Method of Pre-processing.
Advanced Methods and Analysis for the Learning and Social Sciences PSY505 Spring term, 2012 April 9, 2012.
A framework for multiple imputation & clustering -Mainly basic idea for imputation- Tokei Benkyokai 2013/10/28 T. Kawaguchi 1.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Research and Evaluation Methodology Program College of Education A comparison of methods for imputation of missing covariate data prior to propensity score.
Missing data: Why you should care about it and what to do about it
Chapter 14 Introduction to Multiple Regression
Handling Attrition and Non-response in the 1970 British Cohort Study
How useful is a reminder system in collection of follow-up quality of life data in clinical trials? Dr Shona Fielding.
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
Maximum Likelihood & Missing data
Issues in Inferential Statistics
12 Inferential Analysis.
Missing Data Mechanisms
Analysis of missing responses to the sexual experience question in evaluation of an adolescent HIV risk reduction intervention Yu-li Hsieh, Barbara L.
Clinical prediction models
Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical.
Presentation transcript:

Part 2 Attrition: Bias and Loss of Power

Relevant Papers Graham, J.W., (2009). Missing data analysis: making it work in the real world. Annual Review of Psychology, 60, Collins, L. M., Schafer, J. L., & Kam, C. M. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods, 6, 330_351. Hedeker, D., & Gibbons, R.D. (1997). Application of random-effects pattern-mixture models for missing data in longitudinal studies, Psychological Methods, 2, Graham, J.W., & Collins, L.M. (2010, forthcoming). Using Modern Missing Data Methods with Auxiliary Variables to Mitigate the Effects of Attrition on Statistical Power. Chapter 10 in Graham (2010, forthcoming), Missing Data: Analysis and Design. New York: Springer.

Relevant Papers Graham, J.W., Palen, L.A., et al. (2008). Attrition: MAR & MNAR missingness, and estimation bias. Annual Meetings of the Society for Prevention Research, San Francisco, CA. (available upon request) also see: Graham, J.W., (2010, forthcoming). Simulations with Missing Data. Chapter 9 in Graham (2010, forthcoming), Missing Data: Analysis and Design. New York: Springer.

What if the cause of missingness is MNAR? Problems with this statement MAR & MNAR are widely misunderstood concepts I argue that the cause of missingness is never purely MNAR The cause of missingness is virtually never purely MAR either.

MAR vs MNAR "Pure" MCAR, MAR, MNAR never occur in field research Each requires untenable assumptions e.g., that all possible correlations and partial correlations are r = 0

MAR vs MNAR Better to think of MAR and MNAR as forming a continuum MAR vs MNAR NOT even the dimension of interest

MAR vs MNAR: What IS the Dimension of Interest? How much estimation bias? when cause of missingness cannot be included in the model

Bottom Line... All missing data situations are partly MAR and partly MNAR Sometimes it matters... bias affects statistical conclusions Often it does not matter bias has tolerably little effect on statistical conclusions (Collins, Schafer, & Kam, Psych Methods, 2001)

Methods: "Old" vs MAR vs MNAR MAR methods (MI and ML) are ALWAYS at least as good as, usually better than "old" methods (e.g., listwise deletion) Methods designed to handle MNAR missingness are NOT always better than MAR methods

Yardstick for Measuring Bias Standardized Bias = (average parameter est) – (population value) X 100 Standard Error (SE) |bias| < 40 considered small enough to be tolerable t-value off by 0.4

A little background for Collins, Schafer, & Kam (2001; CSK) Example model of interest: X  Y X = Program (prog vs control) Y = Cigarette Smoking Z = Cause of missingness: say, Rebelliousness (or smoking itself) Factors to be considered: % Missing (e.g., % attrition) r YZ r ZR

r YZ Correlation between cause of missingness (Z) e.g., rebelliousness (or smoking itself) and the variable of interest (Y) e.g., Cigarette Smoking

rZRrZR Correlation between cause of missingness (Z) e.g., rebelliousness (or smoking itself) and missingness on variable of interest e.g., Missingness on the Smoking variable Missingness on Smoking (often designated: R or R Y ) Dichotomous variable: R = 1: Smoking variable not missing R = 0: Smoking variable missing

CSK Study Design (partial) Simulations manipulated amount of missingness (25% vs 50%) r ZY (r =.40, r =.90) r ZR held constant r =.45 with 50% missing (applies to "MNAR-Linear" missingness)

CSK Results (partial) (MNAR Missingness) 25% missing, r YZ = no problem 25% missing, r YZ = no problem 50% missing, r YZ = no problem 50% missing, r YZ = problem * "no problem" = bias does not interfere with inference These Results apply to the regression coefficient for X  Y with "MNAR-Linear" missingness (see CSK, 2001, Table 2)

But Even CSK Results Too Conservative Not considered by CSK: r ZR In their simulation r ZR =.45 Even with 50% missing and r YZ =.90 bias can be acceptably small Graham et al. (2008): Bias acceptably small (standardized bias < 40) as long as r ZR <.24

r ZR <.24 Very Plausible Study r ZR ______________ HealthWise (Caldwell, Smith, et al., 2004).106 AAPT (Hansen & Graham, 1991).093 Botvin1.044 Botvin2.078 Botvin3.104 All of these yield standardized bias < 10 (estimated)

Attrition in HealthWise Best (pretest) predictors of Attrition in HW Gender Lifetime Sex Lifetime Alcohol Use Lifetime Smoking Lifetime Dagga Use

CSK and Follow-up Simulations Results very promising Suggest that even MNAR biases are often tolerably small But these simulations still too narrow

Beginnings of a Taxonomy of Attrition Causes of Attrition on Y (main DV) Case 1: not Program (P), not Y, not PY interaction Case 2: P only Case 3: Y only... (CSK scenario) Case 4: P and Y only Graham, J. W. (2009). Annual Review of Psychology.

Beginnings of a Taxonomy of Attrition Causes of Attrition on Y (main DV) Case 5: PY interaction only Case 6: P + PY interaction Case 7: Y + PY interaction Case 8: P, Y, and PY interaction

Taxonomy of Attrition Cases 1-4 often little or no problem Cases 5-8 Jury still out (more research needed) Very likely not as much of a problem as previously though Use diagnostics to shed light

Use of Missing Data Diagnostics Diagnostics based on pretest data not much help Hard to predict missing distal outcomes from differences on pretest scores Longitudinal Diagnostics can be much more helpful

Hedeker & Gibbons (1997) Plot main DV over time for four groups: for Program and Control for those with and without last wave of data Much can be learned

Empirical Examples Hedeker & Gibbons (1997) Drug treatment of psychiatric patients Hansen & Graham (1991) Adolescent Alcohol Prevention Trial (AAPT) Alcohol, smoking, other drug prevention among normal adolescents (7 th – 11 th grade)

Empirical Example Used by Hedeker & Gibbons (1997) IV: Drug Treatment vs. Placebo Control DV: I npatient M ultidimensional P sychiatric S cale (IMPS) 1 = normal 2 = borderline mentally ill 3 = mildly ill 4 = moderately ill 5 = markedly ill 6 = severely ill 7 = among the most extremely ill

From Hedeker & Gibbons (1997) IMPS low = better outcomes Placebo Control Drug Treatment Weeks of Treatment

Longitudinal Diagnostics Hedeker & Gibbons Example Treatment droppers do BETTER than stayers Control droppers do WORSE than stayers Example of Program X DV interaction But in this case, pattern would lead to suppression bias Not as bad for internal validity in presence of significant program effect

AAPT ( Hansen & Graham, 1991) IV: Normative Education Program vs Information Only Control DV: Cigarette Smoking (3-item scale) Measured at one-year intervals 7 th grade – 11 th grade

AAPT Cigarette Smoking (high = more smoking; arbitrary scale) th Control Program

Longitudinal Diagnostics AAPT Example Treatment droppers do WORSE than stayers little steeper increase Control droppers do WORSE than stayers little steeper increase Little evidence for Prog X DV interaction Very likely MAR methods allow good conclusions (CSK scenario holds)

Use of Auxiliary Variables Reduces attrition bias Restores some power lost due to attrition

What Is an Auxiliary Variable? A variable correlated with the variables in your model but not part of the model not necessarily related to missingness used to "help" with missing data estimation Best auxiliary variables: same variable as main DV, but measured at waves not used in analysis model

Model of Interest

Benefit of Auxiliary Variables Example from Graham & Collins (2010) X Y Z complete cases cases missing Y X, Y variables in the model (Y sometimes missing) Z is auxiliary variable

Benefit of Auxiliary Variables Effective sample size (N') Analysis involving N cases, with auxiliary variable(s) gives statistical power equivalent to N' complete cases without auxiliary variables

Benefit of Auxiliary Variables It matters how highly Y and Z (the auxiliary variable) are correlated For example increase r YZ =.40 N = 500 gives power of N' = 542( 8%) r YZ =.60 N = 500 gives power of N' = 608 (22%) r YZ =.80 N = 500 gives power of N' = 733(47%) r YZ =.90 N = 500 gives power of N' = 839(68%)

Effective Sample Size by r YZ r YZ Effective Sample Size

Conclusions Attrition CAN be bad for internal validity But often it's NOT nearly as bad as often feared Don't rush to conclusions, even with rather substantial attrition Examine evidence (especially longitudinal diagnostics) before drawing conclusions Use MI and ML missing data procedures! Use good auxiliary variables to minimize impact of attrition

Part 3: Illustration of Missing Data Analysis: Multiple Imputation with NORM and Proc MI

Multiple Imputation: Basic Steps Impute Analyze Combine results

Imputation and Analysis Impute 40 datasets a missing value gets a different imputed value in each dataset Analyze each data set with USUAL procedures e.g., SAS, SPSS, LISREL, EQS, STATA, HLM Save parameter estimates and SE’s

Combine the Results Parameter Estimates to Report Average of estimate (b-weight) over 40 imputed datasets

Combine the Results Standard Errors to Report Weighted sum of: “within imputation” variance average squared standard error usual kind of variability “between imputation” variance sample variance of parameter estimates over 40 datasets variability due to missing data

Materials for SPSS Regression Starting place downloads (you will need to get a free user ID to download all our free software) missing data software Joe Schafer's Missing Data Programs John Graham's Additional NORM Utilities (this mcgee website is currently down, but I hope to have it up again in the Fall). Please me with any questions.

exit for sample analysis