Download presentation
Presentation is loading. Please wait.
1
Methods for Data-Integration
Ton de Waal 14 March, 2017
2
Overview of project Project “Estimation methods for the integration of administrative sources” Aim of the project: identifying and presenting statistical methods for the integration of administrative data into a statistical production system April 2016 – (end of) March 2017 Part of ESS.VIP Admin Project (Work package 2 “Statistical methods”, Lot 1 “Methodological support”) Sogeti is main contractor Experts from ISTAT, Statistics Netherlands and University of Southampton
3
Overview of project Task 1: Specify usages of admin data
Task 2: Identification and description of statistical tasks where using estimation methods can be envisaged in order to integrate administrative sources Task 3: Comprehensive identification and enumeration of possible estimation methods that could be used for cases identified in Task 2 Task 4: Literature review presenting actual examples for types of usage and tasks identified in Task 2 or 3 Task 5: Methods description Task 6 & 7: Final presentation and report
4
Task 1: Usages of admin data: Direct
Direct Tabulation: Admin data used to produce statistics without resorting to any statistical data. Exploiting only one administrative data source Exploiting multiple administrative data sources Substitution and supplementation for direct collection: Admin data directly used as input but are not sufficient for achieving all objectives Split-population approach Population is split into two or more parts. Admin data used for units where these data are of sufficient quality, and statistical sources used for the remainder of the units Split data approach Administrative data used to provide some of the variables for all population units
5
Task 1: Usages of admin data: Indirect
Creation and maintenance of registers and survey frames Identification of frame units and their connections to population elements Identification of classification and auxiliary variables Editing and imputation Construction of edit rules Construction of models to find errors in data Auxiliary data to construct imputation models Indirect estimation Creation of population benchmarks to be used for calibration Use administrative data in a predictive setting Estimation where administrative and statistical data are used on an equal footing Data validation/confrontation Validation of survey estimates and/or other administrative data sources Address quality issues
6
Task 2: Possible statistical tasks
We have matched statistical tasks to usages by means of GSBPM Statistical tasks for using integrated administrative data I. Data editing and imputation II. Creation of joint statistical micro data a) Data linkage: Identification of the set of unique units residing in multiple datasets b) Statistical matching: Inference of joint distribution based on marginal observations III. Alignment of statistical data a) Alignment of units: Harmonization of relevant units, creation of target statistical units b) Alignment of measurements: Harmonization of variables, derivation of target variables IV. Multisource estimation at aggregated level a) Population size estimation: multiple lists with imperfect coverage of target population b) Univalent estimation: numerically consistent estimation of common variables c) Coherent estimation: aggregates that relate to each other
7
Task 3: Possible estimation methods
I. Data editing and imputation Most methods usually applied for surveys can also be applied for Admin data There are editing methods developed specifically for data obtained through an integration process (micro-integration) II. Creation of joint statistical microdata Identification of the set of unique units residing in multiple datasets and probabilistic record linkage Inference of joint distribution based on marginal observations (statistical matching)
8
Task 3: Possible estimation methods
III. Alignment of statistical data a) Alignment of units b) Alignment of measurements: recently, latent variable models have been proposed IV. Multisource estimation at aggregated level a) Population size estimation: multiple lists with imperfect coverage of population b) Univalent estimation: numerically consistent estimation of common variables Obtaining univalent estimates at cross-sectional level Obtaining univalent estimates at longitudinal level c) Coherent estimation: aggregates that relate to each other in terms of accounting equations
9
Task 4: Examples at NSIs We have focused on those examples that offer most interesting information We have given actual examples in NSIs for Direct tabulation Split data approach Indirect estimation Data validation
10
Task 4: Examples at NSIs Direct tabulation: Use of probabilistic record linkage for statistics on victims and injured people of road accidents (Istat) Spit data approach: Creation of a social policy simulation database by means of statistical matching (Statistics Canada) Indirect estimation: Use of repeated weighting and macro-integration for the construction of the Dutch Population and Housing Census (Statistics Netherlands) Data validation: Estimation of classification errors in admin and survey variables on home ownership (Statistics Netherlands)
11
Task 5: Methods description
We will describe Editing and imputation methods, including micro-integration Methods for creation of joint microdata, including probabilistic linkage and statistical matching Methods for alignment of statistical data, including latent variable models Methods for multi source estimation at aggregated level, including multiple lists with imperfect coverage of population, and methods for obtaining univalent estimates (at cross-sectional level and at longitudinal level)
12
Thank you Thank you for your attention Any questions or comments?
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.