Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand

Similar presentations


Presentation on theme: "Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand"— Presentation transcript:

1 Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand laura.o’sullivan@stats.govt.nz IAOS Vietnam October 2014

2 Outline The Integrated Data Infrastructure (IDI) Terminology IDI linking Near-exact and non-exact Selecting cut-offs Quality Clerical review Linking at Statistics New Zealand and at the Australian Bureau of Statistics 2

3 33 Business data Education Tax Migration & movements Student loans & allowances Benefits Person-centred data Health & safety Justice Families & households Integrated Data Infrastructure (IDI)

4 Terminology Data integration (aka Record linkage) Deterministic linking Probabilistic linking (Fellegi-Sunter theory) Weights Represent the probability that two records are from the same person 4

5 Cut-offs 5

6 Quality 6 True positivesFalse positives False negativesTrue negatives True matches Non matches Unlinked Linked

7 Near-exact and non-exact First name and Last name agreement Date of birth agreement 7 DataInsertDeleteReplaceDoubleSingleSwapAppendTruncate ARobert RobbertRobertKatKatie BRobiertRobrtRovertRoobertRobertRobretKatieKat DataReplaceSwapTranspose A04/08/198202/08/1982 B04/02/198220/08/198208/02/1982

8 Selecting the cut-off 8

9 Quality in the IDI False positive rates Sample from non-exact links Assume near-exact links are true matches Use proportional sampling Non-exact rates Monitoring 9

10 Clerical review 10 DatasetFirst namesLast namesDate of birthSex AMary LouiseBrown04/11/19842 BMary LouHughes04/11/19842 A link with two first names matching and different last name DatasetIdentifierFirst namesLast namesDate of birthSex A12345OwenKeyes06/01/19511 B12345--06/01/19511 A link with unique identifiers and missing name information in one dataset DatasetFirst namesLast namesDate of birthSex AHolly JessicaGordon01/05/19402 BHolly 01/05/19402 A link with missing name information and without unique identifiers

11 Statistics New Zealand and the Australian Bureau of Statistics Statistics New Zealand Census to the Post-enumeration survey (PES) Linking the longitudinal census Australian Bureau of Statistics Linking projects using name and address Census data enhancement project 11

12 Thank you for listening Questions laura.o’sullivan@stats.govt.nz 12


Download ppt "Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand"

Similar presentations


Ads by Google