Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand IAOS Vietnam October 2014
Outline The Integrated Data Infrastructure (IDI) Terminology IDI linking Near-exact and non-exact Selecting cut-offs Quality Clerical review Linking at Statistics New Zealand and at the Australian Bureau of Statistics 2
33 Business data Education Tax Migration & movements Student loans & allowances Benefits Person-centred data Health & safety Justice Families & households Integrated Data Infrastructure (IDI)
Terminology Data integration (aka Record linkage) Deterministic linking Probabilistic linking (Fellegi-Sunter theory) Weights Represent the probability that two records are from the same person 4
Cut-offs 5
Quality 6 True positivesFalse positives False negativesTrue negatives True matches Non matches Unlinked Linked
Near-exact and non-exact First name and Last name agreement Date of birth agreement 7 DataInsertDeleteReplaceDoubleSingleSwapAppendTruncate ARobert RobbertRobertKatKatie BRobiertRobrtRovertRoobertRobertRobretKatieKat DataReplaceSwapTranspose A04/08/198202/08/1982 B04/02/198220/08/198208/02/1982
Selecting the cut-off 8
Quality in the IDI False positive rates Sample from non-exact links Assume near-exact links are true matches Use proportional sampling Non-exact rates Monitoring 9
Clerical review 10 DatasetFirst namesLast namesDate of birthSex AMary LouiseBrown04/11/19842 BMary LouHughes04/11/19842 A link with two first names matching and different last name DatasetIdentifierFirst namesLast namesDate of birthSex A12345OwenKeyes06/01/19511 B /01/19511 A link with unique identifiers and missing name information in one dataset DatasetFirst namesLast namesDate of birthSex AHolly JessicaGordon01/05/19402 BHolly 01/05/19402 A link with missing name information and without unique identifiers
Statistics New Zealand and the Australian Bureau of Statistics Statistics New Zealand Census to the Post-enumeration survey (PES) Linking the longitudinal census Australian Bureau of Statistics Linking projects using name and address Census data enhancement project 11
Thank you for listening Questions 12