Presentation on theme: "Handling Missing Data on ALSPAC"— Presentation transcript:
1 Handling Missing Data on ALSPAC Paul Clarke(CMPO, University of Bristol)ALSPAC Social Science User Group meeting21 May 2008
2 Outline What causes missing data? ‘Types’ of missing data Methods for missing data: quick overviewALSPAC ‘Blitz’ on non-respondentsInvestigating MNAR data in ALSPAC
3 Example ‘ALSPAC’ analysis At age 11Outcome: Mood (ordinal, 3 categories)Depressive symptoms, maternally ratedMain exposure: Physical activity (score)Measured on actigraph, 3 daysAdjustment:BMI (score)Sex, Age at screeningOrdinal logistic regression
5 What causes missing data? Interviewer effectivenessIncentive for participantLoyaltyLetterTelephone callsInterviewer visitsIncomplete questionnaireRefusalFollow-upNon-contactFail to attend clinicParent characteristicsParent & child characteristics
6 Result of processes leading to: Refusal to answer questions (item)Refusal to participate (unit)No contact (unit)Longitudinal-specific: attrition & drop-outNon-response mechanism(s) - NRM
7 Rubin’s definitions1 Missing Completely At Random (MCAR) Independent of observed variablesMissing At Random (MAR)NRM depends only on observed variablesMissing Not At Random (MNAR)NRM depends on missing variables too1Little & Rubin (2002) Statistical Analysis with Missing Data
8 Directed Acyclic Graph (DAG) XRCR independent data MCAR
9 MAR dataYXRCR indirectly related to Y through X and C
10 Methods for MAR data Complete cases analysis/Listwise deletion WeightingWeighting classes, post-stratification(Single) imputation methodse.g. regression, hot-deck/nearest-neighbourMultiple imputation methodse.g. Norm, MICESemiparametric estimators
11 Imputation in practice: pitfalls1 Omitting the outcomeImputing non-normal variablesMAR completely implausibleConvergence of iterative procedures1Sterne et al. (2008) British Medical Journal
12 Complex methods Analysis model e.g. Ordinal logistic regressionImputation model: Missing given ObservedALL assume MAR data
13 MAR data in reality Unknown factors drive non-response ?YXRCUnknown factors drive non-response…correlated with model predictors…but not with Y
14 Why is this important? Weakness of MAR: How do we know? Central problem: missing data is missing!MAR is a “leap of faith”
15 MNAR data?YXRCUnknowns directly correlated with Y?
16 Physical activity example ?MoodPhys ActRBMI, Sex, AgeNRM is mother-driven (child age 11)Child must wear actigraph for 3 daysMother must assess her child’s mood
17 ALSPAC ‘Blitz’ Co-ordinated by Family Liaison Unit 4 tranches: Nov 2007-May 2008Target 5000 teenagers not in last 2 wavesMini-clinic for difficult to persuade
18 Proposed analysis MAR is context dependent Risky behaviours (Glyn Lewis, et al)Outcomes: Cannabis use, sexual practices, etcRisk factors: mental health, sensation seeking, etcBasic analysis:Compare follow-up with main sampleStill differences after adjustment?
19 Unit non-response 100% follow-up rate unlikely! Directly model NRM Continuum of non-responseHard to contact less like main sampleWeighting scheme (Alho 1990; Wood et al. 2006)Lower bound for MNAR bias
20 Item non-response Parallel qualitative post Items: questions on risky behavioursWhat mechanisms drive non-response?Test hypotheses from this project
Your consent to our cookies if you continue to use this website.