1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent Addiction Programs Michael Dennis, Ph.D., Chestnut Health Systems,

1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent Addiction Programs Michael Dennis, Ph.D., Chestnut Health Systems, Bloomington, IL Presentation at the Advisory Committee Meeting for the “Economic Evaluation Methods: Development and Applications (R01 DA018645)”. Cocunut Grove, FL, November 10-11, 2006. Preparation of this manuscript was supported by funding from the Center for Substance Abuse Treatment (CSAT Contract no. 270-2003-00006). The content of this presentation are the opinions of the author and do not reflect the views or policies of the government. Available on line at www.chestnut.org/LI/Posters or by contacting Joan Unsicker at 720 West Chestnut, Bloomington, IL 61701, phone: (309) 827-6026, fax:(309) 829-4661, e- Mail: junsicker@Chestnut.Org

2 This presentation provides.. A quick review of the problems of missingness and methods of imputation based on Schafer 2002 A summary of the practical approach chestnut uses to deal with missing data Focus here is on the conceptual issues and actual effectiveness – not the math or computation formula per se

3 Types of Missingness By design Logical skipouts Item missing Wave missing Unobserved latent constructs

4 Key Terms (From Rubin) Missing Completely at Random (MCAR): No relationship to predictors or dependent variables Missing at Random (MAR): No relationship with dependent variable (can be predicted) Missing Not at Random (MNAR): Related to predictors and or dependent variables

5 The Problem With Listwise Deletion (default) Source: Schafer (2002) Each Estimate are Increasingly biased as we move away from MCAR Smaller SD inflates significance tests Unstable Changes correlations & Relationships Loss of sample is also problematic for multivariate analyses

6 Pair-wise Pair-wise is particularly efficient and unbiased under the assumption of MCAR Becomes rapidly unstable even under MAR Often narrows covariance or variance estimates and distorts relationship in regression or structural equation model (SEM)

7 Problems with other common methods of replacement Source: Schafer & Graham (2002) Mean Subst. Narrows Variance Reg. Est. Still Narrows Variance Only models using real variance are relatively unbiased Hot Deck better but still biased

8 Examples of Predictive Weighted hot-deck: sort people based on related variables, then randomly replace Maximum Likelihood (ML): predict from all other available data. Restricted Maximum Likelihood (RML): predict from all other available data within the same condition (site, time, etc) to preserve differences Multiple imputations: Average over several imputations – a form of boot strapping that does not assume a normal distribution

9 Problem with these methods… Complicated on many variables and/or for multiple analyses All methods have unknown biases under MNAR unless there is a know a-priori basis for modeling missingness (e.g.. A common factor) In longitudinal analysis, this includes knowing the expected trajectory over time.

10 Chestnut Strategy 1: Minimize it Train, monitoring and do quality assurance to get staff to minimize data Use simple logical skips to minimize not applicable questions and burden Differentiate between refusals (rare), don’t knows (more common) and skip outs (common) – track and do problem solving if refusals start occurring on specific items (which is MNAR) Put more effort into follow-up

11 Follow-up Rates are PRIMARILY related to effort Source: Scott (2004)

12 Accepting a lower follow-up rate “biases” results Source: Scott (2004) The easiest to find people are different on the outcome – which is MNAR The differences are as or larger as the treatment effects we are looking for

13 Strategy 2: Make Logical Edits 1.Design questionnaire so that there are clear simple logical edits with implied value 2.Test logic of edits (all do not work, e.g., M1) 3.Replace logical skip outs with implied value 4.Test logic of complex edits to create summary measures (all do not work, e.g.., NHSDA) 5.Make complex edits

14 Strategy 3: Replace missing data within known factors Recall that this was one of the few ways to deal with MNAR Know common factors should have a Cronbach’s alpha of at least.7 Evaluate amount of missing – ‾by design (e.g., adding an item in a new version) is MCAR, ‾systematic refusal is MNAR. Calculate scale as mean of valid items x expected number of items. (Require at least 3 valid) Generally do above within subscale, then sum up to higher order scales

15 PERSONS MAP OF ITEMS | 2 TRUNCATED.### | ## |.## |. | HlthProbs.## |T 1.## +.## S|.### |.### |S Withdrawal/ill.#### | ProbW/Law.###### | Unsafe GiveUpActs DespiteMedPsyProbs.#### | DepressedNervous NeededMoreAOD UnableCutDown 0.###### +M.###### | ResponNotMet LargerAmnt/more.####### |.############ M| HideWhenUseAOD Fights/trouble.###### |S SpentTimeGetting.####### |.###### | ParentComplained -1.###### +.##### |T WeeklyAOD. |.###### |.#### |. S|.###### | -2. +.#### |.##### | -3 TRUNCATED + -4.############ + EACH '#' is 24 Example: GAIN Substance Problems Scale (SPS) Rasch Model Demonstrating Severity of Items are NOT Equal Source: Riley et al (in press)

16 Use of Rasch Measurement Model / Computer Adaptive Tests (CAT) models GAIN Substance Problem Scale (SPS) Measure Withdrawal Symptoms Frequency of Use Emotional Problems Recovery Environment Health Problems Symptom Count (16)0.530.380.360.370.19 Full Rasch (16)0.540.430.410.390.22 CAT (5-11 items)0.570.450.440.400.23 CAT can closely approximate with a fraction of items Weighting items with Rasch Does a Little Better Construct validation: Comparing alternative measures to “expected” correlates Source: Riley et al (in press)

17 Strategy 4: Replace structural missing data (e.g.., by site) Where data is missing structurally by design (i.e., MCAR), use regression to impute value based on correlated factors in other sites (seeking formula with 70% or more of variance explained). Simple regression if small percent of data (under 5%) As the amount of missing data goes up to 15%, it is worth considering the use of ML or MI Above 15% missing, all methods are questionable At this point we usually have less than 1% missing within wave, but 5-20% or more by wave

18 Strategy 5: Replacement within wave Identify remaining items with more than 1-2% missing and the feasibility of replacing via regression (or ML/MI) For the rest, sort data on key dimensions of variation and do modified weighted hot deck on the 2-3 people above or below -we typically sort on a total symptom count and the baseline dependent variable within count, condition & site -Can replace with mean, median or random choice – we have found that the median was more stable because of the skewed nature of several distributions and use it by default

19 Understanding Multidimensional Nature can be used to Create Additional Strata for Replacement Female Sex Risk Needle Risk Crack Risk % Blue Male Sex Risk Dimension High Risk Needle Sharers Male Sex Buyers Female Sex Traders Source: Dennis et al (2001)

20 Important to block on Condition in Experiments or Quasi-Experiments Unrestricted replacement would average out real variance effect of experimental condition

21 Strategy 6: Replacement Across Waves Create a summary measure based on the average across waves times the expected number of waves to get a total (e.g.., total days of abstinence) -Works best when most people only have 1-2 waves of several (e.g.., 4-8) missing -Above can become biased is missing data by wave is high or systematic Can regress from first/last or all available to fill in Need to know the expected trajectory

22 Special Case of A Curvilinear Trajectory Source: Godley et al (2004)

23 Special Case of A Curvilinear Trajectory Very Biased Source: Godley et al (2004)

24 Special Case of A Curvilinear Trajectory Much less biased Source: Godley et al (2004)

25 Strategy 7: Use of Maximum Likelihood (ML) Where possible, use ML or Restricted ML (RML) as part of software applications like AMOS, Stata etc. Need to evaluate how much data it is replacing Need to be confident that it is not MAR (vs. MNAR) by virtual of small n missing, knowledge of reason, or other analyses Restricted ML (RML) preferred to control for site, condition, and/or subject differences. Alternative: We have not used, but have been thinking about exploring some of the new methods of multiple imputation

26 References Dennis, M. L., Wechsberg, W. M., McDermeit (Ives), M., Campbell, R. S., & Rasch, R.R. (2001). The correlates and predictive validity of HIV risk groups among drug users in a community-based sample: Methodological findings from a multi-site cluster analysis. Evaluation and Program Planning, 24, 187-206. Godley, S. H., Dennis, M. L., Godley, M. D., & Funk, R. R. (2004). Thirty- month relapse trajectory cluster groups among adolescents discharged from outpatient treatment. Addiction, 99, 129-139. Riley, B. B., Conrad, K. J., Bezruczko, N., & Dennis, M. (in press). Relative precision, efficiency and construct validity of different starting and stopping rules for a Computerized Adaptive Test: The GAIN Substance Problem Scale. Journal of Applied Measurement. Schafer, J. L., & Graham, J. W. (2002). Missing data Our view of the state of the art. Psychological Methods, 7, 147-177. Scott, C. K. (2004). A replicable model for achieving over 90% follow-up rates in longitudinal studies of substance abusers. Drug and Alcohol Dependence, 74, 21-36.

1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent Addiction Programs Michael Dennis, Ph.D., Chestnut Health Systems,

Similar presentations

Presentation on theme: "1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent Addiction Programs Michael Dennis, Ph.D., Chestnut Health Systems,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent Addiction Programs Michael Dennis, Ph.D., Chestnut Health Systems,

Similar presentations

Presentation on theme: "1 Practical Approaches for Dealing with Missing Data in Longitudinal Analyses of Adolescent Addiction Programs Michael Dennis, Ph.D., Chestnut Health Systems,"— Presentation transcript:

Similar presentations

About project

Feedback