Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handling Missing Data on ALSPAC

Similar presentations

Presentation on theme: "Handling Missing Data on ALSPAC"— Presentation transcript:

1 Handling Missing Data on ALSPAC
Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008

2 Outline What causes missing data? ‘Types’ of missing data
Methods for missing data: quick overview ALSPAC ‘Blitz’ on non-respondents Investigating MNAR data in ALSPAC

3 Example ‘ALSPAC’ analysis
At age 11 Outcome: Mood (ordinal, 3 categories) Depressive symptoms, maternally rated Main exposure: Physical activity (score) Measured on actigraph, 3 days Adjustment: BMI (score) Sex, Age at screening Ordinal logistic regression

4 Missing Value (MV) pattern1
Sex Mood BMI Physical Activity Age 4220 673 ? 1204 5397 11679 Monotone 90% 858 12352 Non-monotone 95% 1All MV patterns < 200 cases ignored

5 What causes missing data?
Interviewer effectiveness Incentive for participant Loyalty Letter Telephone calls Interviewer visits Incomplete questionnaire Refusal Follow-up Non-contact Fail to attend clinic Parent characteristics Parent & child characteristics

6 Result of processes leading to:
Refusal to answer questions (item) Refusal to participate (unit) No contact (unit) Longitudinal-specific: attrition & drop-out Non-response mechanism(s) - NRM

7 Rubin’s definitions1 Missing Completely At Random (MCAR)
Independent of observed variables Missing At Random (MAR) NRM depends only on observed variables Missing Not At Random (MNAR) NRM depends on missing variables too 1Little & Rubin (2002) Statistical Analysis with Missing Data

8 Directed Acyclic Graph (DAG)
X R C R independent  data MCAR

9 MAR data Y X R C R indirectly related to Y through X and C

10 Methods for MAR data Complete cases analysis/Listwise deletion
Weighting Weighting classes, post-stratification (Single) imputation methods e.g. regression, hot-deck/nearest-neighbour Multiple imputation methods e.g. Norm, MICE Semiparametric estimators

11 Imputation in practice: pitfalls1
Omitting the outcome Imputing non-normal variables MAR completely implausible Convergence of iterative procedures 1Sterne et al. (2008) British Medical Journal

12 Complex methods Analysis model
e.g. Ordinal logistic regression Imputation model: Missing given Observed ALL assume MAR data

13 MAR data in reality Unknown factors drive non-response
? Y X R C Unknown factors drive non-response …correlated with model predictors …but not with Y

14 Why is this important? Weakness of MAR: How do we know?
Central problem: missing data is missing! MAR is a “leap of faith”

15 MNAR data ? Y X R C Unknowns directly correlated with Y?

16 Physical activity example
? Mood Phys Act R BMI, Sex, Age NRM is mother-driven (child age 11) Child must wear actigraph for 3 days Mother must assess her child’s mood

17 ALSPAC ‘Blitz’ Co-ordinated by Family Liaison Unit
4 tranches: Nov 2007-May 2008 Target 5000 teenagers not in last 2 waves Mini-clinic for difficult to persuade

18 Proposed analysis MAR is context dependent
Risky behaviours (Glyn Lewis, et al) Outcomes: Cannabis use, sexual practices, etc Risk factors: mental health, sensation seeking, etc Basic analysis: Compare follow-up with main sample Still differences after adjustment?

19 Unit non-response 100% follow-up rate unlikely! Directly model NRM
Continuum of non-response Hard to contact less like main sample Weighting scheme (Alho 1990; Wood et al. 2006) Lower bound for MNAR bias

20 Item non-response Parallel qualitative post
Items: questions on risky behaviours What mechanisms drive non-response? Test hypotheses from this project

Download ppt "Handling Missing Data on ALSPAC"

Similar presentations

Ads by Google