Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand

Slides:



Advertisements
Similar presentations
Estimating the Level of Underreporting of Expenditures among Expenditure Reporters: A Further Micro-Level Latent Class Analysis Clyde Tucker Bureau of.
Advertisements

1 Editing the Integrated Census in Israel. EDITING THE INTEGRATED CENSUS IN ISRAEL Prepared by Eva Rotenberg, Central Bureau of Statistics, Israel (1)
DATA FROM ADMINISTRATIVE SOURCES
The Nature of the Bias When Studying Only Linkable Person Records: Evidence from the American Community Survey Adela Luque (U.S. Census Bureau) Brittany.
1 Third Workshop on ICP Western Asia Beirut, October 2004 Design of ICP price survey Sultan Ahmad, Consultant Based on Keith.
The Linked PDD-Death Product More than you want to know David Zingmond, MD, PhD Division of General Internal and Health Services Research UCLA School of.
Wisconsin Department of Health Services Richard Miller Research Scientist Wisconsin Office of Health Informatics October 28, 2014 Matching Traffic Crash.
Sampling Frames for Establishment Surveys International Workshop on Industrial Statistics Beijing, China, 8-10 July 2013.
Using Administrative Data to Improve Social Statistics – An Example of Collaborative Work Minda Phillips, Office for National Statistics. Paul Sinclair,
The Scottish Government Statistics and Constitutional Change Roger Halliday Chief Statistician March 2014.
Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Bosna i Hercegovina Agencija za statistiku Bosne i Hercegovine Post-enumeration Survey-A.
ESSnet DI WP2: Record Linkage Luca Valentino Istat.
Graph Analysis Matching Program Burdette Pixton. Record Linkage Object Identification Problem Identifies possible links in pedigrees Advantages Compress.
Capturing Sensitive Data & Data Linkage. Capturing Sensitive Data Data Protection Act 1998 (Section 33) – Allows data to be used for research purposes.
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2007.
March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)
When adjusting for bias due to linkage errors: a sensitivity analysis Q2014 Tiziana Tuoto 05/06/2014 Joint work with Loredana Di Consiglio.
1 The 2010 Census Coverage Measurement Survey Patrick J. Cantwell U.S Census Bureau Annual Meeting of the Association of Public Data Users September 25,
Developing and improving data resources for social science research Enhancing, enriching and developing household sample surveys in the UK: the strategic.
© John M. Abowd and Lars Vilhuber 2005, all rights reserved Introduction to Probabilistic Record Linking John M. Abowd and Lars Vilhuber March 2005.
The Census Data Enhancement Project Glenys Bishop.
© 2007 John M. Abowd, Lars Vilhuber, all rights reserved Estimating m and u Probabilities Using EM Based on Winkler 1988 "Using the EM Algorithm for Weight.
Economics and Statistics Administration U.S. CENSUS BUREAU U.S. Department of Commerce Comparing IRS Exemptions to 2010 Census Population Counts Esther.
Integrated Data Infrastructure (IDI) Project manager – Guido Stark June 2012 Linking data across government How Statistics New Zealand maintains privacy.
Becoming Canadian Citizens: Intent, process and outcome Kelly Tran, Tina Chui: Statistics Canada Stan Kustec, Martha Justus: Citizenship and Immigration.
Beyond 2011: Automating the linkage of anonymous data Pete Jones Office for National Statistics.
Introduction to Record Linking John M. Abowd and Lars Vilhuber April 2011 © 2011 John M. Abowd, Lars Vilhuber, all rights reserved.
Overview of Administrative Records on Population and Housing
Dutch Virtual Census Presentation at the International Seminar on Population and Housing Censuses; Beyond the 2010 Round November, 2012 Egon Gerards,
Experiences and Challenges: Review on China’s Agricultural Censuses Xu ZhiQuan Department of Rural Surveys, National Bureau of Statistics.
2006 ICE meeting Using Linked Data to Examine Injury and Disability Beth Rasch and Chris Cox National Center for Health Statistics.
Record matching for census purposes in the Netherlands Eric Schulte Nordholt Senior researcher and project leader of the Census Statistics Netherlands.
All the answers? Statistics New Zealand’s Integrated Data Infrastructure Paper by Felibel Zabala, Rodney Jer, Jamas Enright and Allyson Seyb Presented.
Poverty Monitoring in Rural China Zude Xian Rural Survey Organization, National Bureau of Statistics, P.R. China.
S T A T I S T I C S A U S T R I A May 13th – 15th Register Based Census “The Austrian Principles of Redundancy” UNECE/Eurostat.
Longitudinal Data Recent Experience and Future Direction August 2012.
Assessing Disclosure for a Longitudinal Linked File Sam Hawala – US Census Bureau November 9 th, 2005.
The relationship between error rates and parameter estimation in the probabilistic record linkage context Tiziana Tuoto, Nicoletta Cibella, Marco Fortini.
Editing a Mixture of Canadian 2006 Census and Tax Data Mike Bankier Statistics Canada 2006 Work Session on Statistical Data Editing
1 Data Linkage for Educational Research Royal Statistical Society March 19th 2007 Andrew Jenkins and Rosalind Levačić Institute of Education, University.
The Conditional Independence Assumption in Probabilistic Record Linkage Methods Stephen Sharp National Records of Scotland Ladywell Road Edinburgh EH12.
October 28-30, 2009 UNECE Geneva Quality Assessment of 2008 Integrated Census - Israel Pnina ZADKA Central Bureau of Statistics Israel.
Integrating Administrative Records into the Federal Statistical System 2.0 Shelly Wilkie Martinez Statistical and Science Policy U. S. Office of Management.
Assessing SES differences in life expectancy: Issues in using longitudinal data Elsie Pamuk, Kim Lochner, Nat Schenker, Van Parsons, Ellen Kramarow National.
May 12-15, Evaluating the Integrated Census Israel Pnina ZADKA Central Bureau of Statistics Israel.
Biolink NL A national infrastructure for linkage of biobanks to medical and socioeconomic registries Adelaide Ariel SHIP Conference 28th-30th August 2013.
Current Approaches to Measuring Asset Ownership and Control: MALDIVES Department of National Planning.
Comments for Hungarian and South Africa’s PRESENTATION Wu Jie Department of Population and Employment National Bureau of Statistics of China 27 – 30 June.
UN ECE Seminar on New Frontiers for Statistical Data Collection 31 Oct – 2 Nov 2012 Beyond 2011 The future of population statistics Andy Teague, Office.
1/22#/ Post Enumeration Survey for Population Census Jaewon Lee Statistical Research Institute Statistics Korea.
Using administrative data to produce official social statistics New Zealand’s experience.
Briefing on Census Data Evaluation in China Xiao Ning Guo Hui Hanoi, Viet Nam 2-6 December 2013.
Enhancing the usefulness of census data through linking census and administrative data Dr Paul Jelfs Assistant Statistician Australian Bureau of Statistics.
IAOS Shanghai – Reshaping Official Statistics Some Initiatives on Combining Data to Support Small Area Statistics and Analytical Requirements at.
Synthetic Approaches to Data Linkage Mark Elliot, University of Manchester Jerry Reiter Duke University Cathie Marsh Centre.
Marc Hamel and Julie Trépanier May 21, 2014 Canadian Statistical Demographic Database: A research project.
Necessary but not sufficient? Youth responses to localised returns to education Nicholas Biddle Centre for Aboriginal Economic Policy Research, ANU Conference.
Developing job linkages for the Health and Retirement Study John Abowd, Margaret Levenstein, Kristin McCue, Dhiren Patki, Ann Rodgers, Matthew Shapiro,
Measuring Data Quality in the BLS Business Register Richard Clayton Sherry Konigsberg David Talan WiesbadenGroup on Business Registers Tallin, Estonia.
Evaluating imputation of sex and age for substitutes in substitute households Michael Ryan 2008 UNECE Work Session on Statistical Data Editing.
Challenges in data linkage: error and bias
Adult Dental Health Survey 2009 Methodology
Introduction to Probabilistic Record Linking
Linking CRASH Data with Health Data Systems Improving motor vehicle safety through public health partnership Michelle Lackovic - Louisiana Public Health.
Statistics Netherlands Division Social and Spatial Statistics
SocialLink Emily Mason 19 October 2017
POTENTIALS OF FOR DATA LINKAGE
Pnina ZADKA Central Bureau of Statistics Israel
Pnina ZADKA Central Bureau of Statistics Israel
Stephanie Hirner ESTP ”Administrative data and censuses
Presentation transcript:

Linking, selecting cut-offs, and examining quality in the Integrated Data Infrastructure (IDI) Laura O’Sullivan Statistics New Zealand IAOS Vietnam October 2014

Outline The Integrated Data Infrastructure (IDI) Terminology IDI linking Near-exact and non-exact Selecting cut-offs Quality Clerical review Linking at Statistics New Zealand and at the Australian Bureau of Statistics 2

33 Business data Education Tax Migration & movements Student loans & allowances Benefits Person-centred data Health & safety Justice Families & households Integrated Data Infrastructure (IDI)

Terminology Data integration (aka Record linkage) Deterministic linking Probabilistic linking (Fellegi-Sunter theory) Weights Represent the probability that two records are from the same person 4

Cut-offs 5

Quality 6 True positivesFalse positives False negativesTrue negatives True matches Non matches Unlinked Linked

Near-exact and non-exact First name and Last name agreement Date of birth agreement 7 DataInsertDeleteReplaceDoubleSingleSwapAppendTruncate ARobert RobbertRobertKatKatie BRobiertRobrtRovertRoobertRobertRobretKatieKat DataReplaceSwapTranspose A04/08/198202/08/1982 B04/02/198220/08/198208/02/1982

Selecting the cut-off 8

Quality in the IDI False positive rates Sample from non-exact links Assume near-exact links are true matches Use proportional sampling Non-exact rates Monitoring 9

Clerical review 10 DatasetFirst namesLast namesDate of birthSex AMary LouiseBrown04/11/19842 BMary LouHughes04/11/19842 A link with two first names matching and different last name DatasetIdentifierFirst namesLast namesDate of birthSex A12345OwenKeyes06/01/19511 B /01/19511 A link with unique identifiers and missing name information in one dataset DatasetFirst namesLast namesDate of birthSex AHolly JessicaGordon01/05/19402 BHolly 01/05/19402 A link with missing name information and without unique identifiers

Statistics New Zealand and the Australian Bureau of Statistics Statistics New Zealand Census to the Post-enumeration survey (PES) Linking the longitudinal census Australian Bureau of Statistics Linking projects using name and address Census data enhancement project 11

Thank you for listening Questions 12