5What causes missing data? Interviewer effectivenessIncentive for participantLoyaltyLetterTelephone callsInterviewer visitsIncomplete questionnaireRefusalFollow-upNon-contactFail to attend clinicParent characteristicsParent & child characteristics
6Result of processes leading to: Refusal to answer questions (item)Refusal to participate (unit)No contact (unit)Longitudinal-specific: attrition & drop-outNon-response mechanism(s) - NRM
7Rubin’s definitions1 Missing Completely At Random (MCAR) Independent of observed variablesMissing At Random (MAR)NRM depends only on observed variablesMissing Not At Random (MNAR)NRM depends on missing variables too1Little & Rubin (2002) Statistical Analysis with Missing Data
8Directed Acyclic Graph (DAG) XRCR independent data MCAR
9MAR dataYXRCR indirectly related to Y through X and C
10Methods for MAR data Complete cases analysis/Listwise deletion WeightingWeighting classes, post-stratification(Single) imputation methodse.g. regression, hot-deck/nearest-neighbourMultiple imputation methodse.g. Norm, MICESemiparametric estimators
11Imputation in practice: pitfalls1 Omitting the outcomeImputing non-normal variablesMAR completely implausibleConvergence of iterative procedures1Sterne et al. (2008) British Medical Journal
12Complex methods Analysis model e.g. Ordinal logistic regressionImputation model: Missing given ObservedALL assume MAR data
13MAR data in reality Unknown factors drive non-response ?YXRCUnknown factors drive non-response…correlated with model predictors…but not with Y
14Why is this important? Weakness of MAR: How do we know? Central problem: missing data is missing!MAR is a “leap of faith”
15MNAR data?YXRCUnknowns directly correlated with Y?
16Physical activity example ?MoodPhys ActRBMI, Sex, AgeNRM is mother-driven (child age 11)Child must wear actigraph for 3 daysMother must assess her child’s mood
17ALSPAC ‘Blitz’ Co-ordinated by Family Liaison Unit 4 tranches: Nov 2007-May 2008Target 5000 teenagers not in last 2 wavesMini-clinic for difficult to persuade
18Proposed analysis MAR is context dependent Risky behaviours (Glyn Lewis, et al)Outcomes: Cannabis use, sexual practices, etcRisk factors: mental health, sensation seeking, etcBasic analysis:Compare follow-up with main sampleStill differences after adjustment?
19Unit non-response 100% follow-up rate unlikely! Directly model NRM Continuum of non-responseHard to contact less like main sampleWeighting scheme (Alho 1990; Wood et al. 2006)Lower bound for MNAR bias
20Item non-response Parallel qualitative post Items: questions on risky behavioursWhat mechanisms drive non-response?Test hypotheses from this project