Presentation is loading. Please wait.

Presentation is loading. Please wait.

Handling Missing Data on ALSPAC Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008.

Similar presentations


Presentation on theme: "Handling Missing Data on ALSPAC Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008."— Presentation transcript:

1 Handling Missing Data on ALSPAC Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008

2 Outline What causes missing data? Types of missing data Methods for missing data: quick overview ALSPAC Blitz on non-respondents Investigating MNAR data in ALSPAC

3 Example ALSPAC analysis At age 11 Outcome: Mood (ordinal, 3 categories) –Depressive symptoms, maternally rated Main exposure: Physical activity (score) –Measured on actigraph, 3 days Adjustment: –BMI (score) –Sex, Age at screening Ordinal logistic regression

4 Missing Value (MV) pattern 1 1 All MV patterns < 200 cases ignored MV Pattern SexMoodBMIPhysical Activity Age ?? 1204 ??? 5397 ???? Monotone90% 858 ? Non- monotone 95%

5 Non- contact Refusal Letter Telephone calls Interviewer visits Interviewer effectiveness Incentive for participant Loyalty Fail to attend clinic Incomplete questionnaire Parent characteristics Parent & child characteristics Follow-up What causes missing data?

6 Result of processes leading to: –Refusal to answer questions (item) –Refusal to participate (unit) –No contact (unit) –Longitudinal-specific: attrition & drop-out Non-response mechanism(s) - NRM

7 Rubins definitions 1 Missing Completely At Random (MCAR) –Independent of observed variables Missing At Random (MAR) –NRM depends only on observed variables Missing Not At Random (MNAR) –NRM depends on missing variables too 1 Little & Rubin (2002) Statistical Analysis with Missing Data

8 Directed Acyclic Graph (DAG) YX R C R independent data MCAR

9 MAR data YX R C R indirectly related to Y through X and C

10 Methods for MAR data Complete cases analysis/Listwise deletion Weighting –Weighting classes, post-stratification (Single) imputation methods –e.g. regression, hot-deck/nearest-neighbour Multiple imputation methods –e.g. Norm, MICE Semiparametric estimators

11 Imputation in practice: pitfalls 1 Omitting the outcome Imputing non-normal variables MAR completely implausible Convergence of iterative procedures 1 Sterne et al. (2008) British Medical Journal

12 Complex methods Analysis model –e.g. Ordinal logistic regression Imputation model: Missing given Observed ALL assume MAR data

13 MAR data in reality YX R C Unknown factors drive non-response ? …correlated with model predictors …but not with Y

14 Why is this important? Weakness of MAR: How do we know? Central problem: missing data is missing! MAR is a leap of faith

15 MNAR data YX R C Unknowns directly correlated with Y ? ?

16 Physical activity example MoodPhys Act R BMI, Sex, Age NRM is mother-driven (child age 11) Child must wear actigraph for 3 days Mother must assess her childs mood ?

17 ALSPAC Blitz Co-ordinated by Family Liaison Unit 4 tranches: Nov 2007-May 2008 Target 5000 teenagers not in last 2 waves Mini-clinic for difficult to persuade

18 Proposed analysis MAR is context dependent Risky behaviours (Glyn Lewis, et al) –Outcomes: Cannabis use, sexual practices, etc –Risk factors: mental health, sensation seeking, etc Basic analysis: –Compare follow-up with main sample –Still differences after adjustment?

19 Unit non-response 100% follow-up rate unlikely! Directly model NRM Continuum of non-response –Hard to contact less like main sample –Weighting scheme ( Alho 1990; Wood et al ) Lower bound for MNAR bias

20 Item non-response Parallel qualitative post Items: questions on risky behaviours What mechanisms drive non-response? Test hypotheses from this project


Download ppt "Handling Missing Data on ALSPAC Paul Clarke (CMPO, University of Bristol) ALSPAC Social Science User Group meeting 21 May 2008."

Similar presentations


Ads by Google