Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone.

Similar presentations


Presentation on theme: "Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone."— Presentation transcript:

1 Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone Ambroselli (Istat, Italy) Q 2014 - European Conference on Quality in Official Statistics (Q2014) Vienna, 2 – 5 June 2014

2 Dealing with an increase of the use of the Administrative Data (AD) for statistical purposes has become a common condition for the majority of the NSIs in the last decade. In Istat the administrative data sets acquired for statistical uses has increased from 90 in the 2009 to 230 in the 2013 as many statistical processes currently use them or are planning to review the production processes in this direction. Q2014, Vienna Prologue The use of administrative data increases

3 Central level coordination A dedicated office named ADA (Administrative Data Acquisition and integration, under the Censuses and Statistical Registers Directorate) is responsible for the following tasks: acquiring AD storing AD integrating AD (Integrated System of Microdata - SIM) evaluating AD quality make AD and their metadata available to internal statistics producers Action AD Management strategy for efficiency and quality [1] Q2014, Vienna

4 Advantages It allows to: better ensure compliance with the legislation on the confidentiality of the data optimize timeliness and efficiency in acquiring AD and in making them accessible to users within the institute unify common data treatments provide a common description of the AD quality through a Quality Report Card facilitate the management of relationships with AD provider activate those necessary feedback to improve AD quality, in collaboration with AD producers. AD Management strategy for efficiency and quality [2] Q2014, Vienna

5 AD Management strategy for efficiency and quality [3] Acquisition procedures Integrated System of Microdata Repository SIM AD quality evaluation Statistical processes using AD Dissemination to statistics users ADA functions

6 ADA centralized functions A. Collection of AD requirements from statistics producers B. Formulation of AD requests ​​for each AD holder and for each AD source C. AD acquisition Acquisition procedures 1.5 Check data availability 3.1 Build collection instrument 4.3 Run collection 4.4 Finalize collection 1.1 Identify data needs (considering potential of AD) GSBPM 2.3 Design collection General Statistical Business Process Model Q2014, Vienna

7 AD Management strategy for efficiency and quality [3] Acquisition procedures Integrated System of Microdata Repository SIM AD quality evaluation Statistical processes using AD Dissemination to statistics users ADA functions

8 Def.: Repository of integrated administrative microdata to support the statistical production processes Goals Make the AD accessible in a uniform way to users within the institute Avoid duplicate work Titolo intervento, nome cognome relatore – Luogo, data Integrated System of Microdata Repository - SIM 1

9 Titolo intervento, nome cognome relatore – Luogo, data 1 D. Formal Concept Analysis/ identification of objects and relations E. Loading data into tablesF. AD IntegrationG. Recoding H. Dissemination to Istat statistics producers Statistical processes using AD Dissemination to statistics users ADA centralized functions 5.2 Classify and code 5.1 Integrate data GSBPM SIM General Statistical Business Process Model

10 The step of integration refers to the process of linkage among objects recorded in different sources: individuals, economic units, places (in progress). Each object entering the SIM is recognized/identified with a unique and stable (over time) ID number. Depending on the linking variable(s) available, a suitable integration strategy and a set of algorithms are applied. Data integration process feeds the development of the DBs for the integration of each subsystem. The DBs for integration are warehouses of microdata useful to guarantee a unified view of the specific object under analysis showing information available in the different sources. Titolo intervento, nome cognome relatore – Luogo, data AD Integration 1

11 ETL process of AD ADS 1 ADS 2 ADS N Identification and Integration Support in the development of the thematic DBs for statistical processes Physical structures Virtual structures Statistical processes Statistical Registers Statistical Information Systems DBs for the integration for each subsystem SIM border

12 SIM: the DBs for integration Structure the ID of the sources the serial number internal to all the sources in which the object is recognized the ID of the object in the subsystem of integration the variables used for the integration for all the sources in which the object is present the different kind of record linkage used to enter in the DB the time reference for linkage validity The presence of the unique ID determines a spider web structure of relationships able to guarantee that every source is connected with the DB for integration and, at the same time, with all the others that are part of the same subsystem of integration. Q2014, Vienna

13 SIM RELATIONSHIPS AMONG INDIVIDUALS SIM ECONOMIC UNITS SIM UNITS SIM PLACES SIM RELATIONSHIPS SIM RELATIONSHIPS BEETWEEN INDIVIDUALS AND ECONOMIC UNITS INDIVIDUAL ID ECONOMIC UNIT ID INDIVIDUAL ID – FAMILY ID INDIVIDUAL ID – FAMILY ID ECONOMIC UNIT ID – LOCAL UNITS ID INDIVIDUAL ID – ECONOMIC UNIT ID INDIVIDUAL ID – INDIVIDUAL PLACES ID INDIVIDUAL ID – INDIVIDUAL PLACES ID ECONOMIC UNIT ID – LOCAL UNITS ID SIM: the subsystems Q2014, Vienna SIM PLACES INDIVIDUALS SIM INDIVIDUALS SIM RELATIONSHIPS AMONG ECONOMIC UNITS SIM PLACES ECONOMIC UNITS

14 AD Management strategy for efficiency and quality [3] Acquisition procedures Integrated System of Microdata Repository SIM AD quality evaluation Statistical processes using AD Dissemination to statistics users ADA functions

15 Survey data Register data Input Data Treatment (transformation function) Administrative data User statistics oriented Producer statistics oriented AD quality evaluation in the Statistical production process Data Treatment (transformation function) Statistical Output Quality Report Card for Administrative Data “AD quality” is considered in relation to the AD reuse for statistical purposes, taking into account that the AD are not primarily produced for statistical purposes.

16 Quality Report Card for Administrative data – QRCA objectives Assess the AD quality in terms of input of the statistical production process for its potential usability  Usability analysis Monitoring AD for two main reasons: a) regulatory changes may induce discontinuity producing significant impacts on the statistics production; b) before AD enter into statistical production process an analysis must be carried out to verify the presence of unexpected lack of quality  AD monitoring function Check AD compliance with respect to the requests and support the loading data process. Where appropriate define alert / warning to optimize the timing of the data acquisition and release  Data supply monitoring function For Istat potential users For Istat current users For the AD acquisition process Q2014, Vienna

17 The AD quality framework considers a hierarchical and multidimensional approach including issues directly connected with the AD quality and those information for the AD management process aimed at improving the statistical AD quality/usability AD quality framework The AD quality framework adopted is based on that originally defined by Statistics Netherlands [1] and then developed within the international BlueEts project, WP4 [2]. [1] Daas et al. (2009) Checklist for the Quality evaluation of AD Sources. Discussion paper 09042, Statistics Netherlands. [2] Daas et al. (2011) Reports on methods preferred for the quality indicators of administrative data sources, Deliverable 4.2 of Workpackage 4 of the BLUE-ETS project. CBS, Netherlands, SSB, Norway, Istat, Italy, SCB, Sweden. Q2014, Vienna

18 Titolo intervento, nome cognome relatore – Luogo, data AD quality framework for AD supplied 1

19 Adapting the Quality framework and the QRCA to the Istat ADA process Implementing the QRCA through interoperability among ADA processes With the purpose of complying the appropriate efficiency and timeliness, a system that allows making the AD quality evaluation as automated as possible is being planning. Following the OECD Core principles for metadata management, the strategy aims to take advantage of all the available metadata from the production process using AD, that is to make metadata “active” to the greatest extent possible for supporting the QRCA production. QRCA and the ADA process Q2014, Vienna

20 The implementation of the Source Hyperdimension quality indicators takes advantage of all the information used for the AD acquisition procedures. For the moment this information is managed in a not fully automatic way but Istat is proceeding in this direction storing and organizing it according with the AD quality framework in view of its reuse in the process of quality assessment. With respect to the Relevance quality Dimension, a specific Report for each main AD source is being finalized. It will provide information about all their statistical uses (derived automatically from the Acquisition procedures metadata) and about the compliance with the Istat requirements in terms of quality, timeliness, contents (derived from a very short questionnaire that could be submitted to AD source users). Implementation of the Source Hyperdimension Q2014, Vienna

21 The Clarity quality Dimension considers metadata that should be available from the data source holder. In this regard, a procedure is in place which should allow the acquisition of the metadata together with data. But a strong collaboration with the AD holder has to be expected. In most cases definitions are deduced by free-form metadata available. To make the system more efficient these definitions could be shared among AD source users in the QRCA. In addition, the phase D. “Formal Concept Analysis/ identification of objects and relations” of the ADS, and the consequent data loading in the relational database, can allow to automatically identify the set of objects /entities to be evaluated. Titolo intervento, nome cognome relatore – Luogo, data Implementation of the Metadata Hyperdimension [1] 1

22 About the Comparability it should be possible to define a bridge linking the statistical units with the corresponding administrative units and the output statistical variables with the administrative ones used in the production process. Following the strategy of the interoperability among systems, new Istat Unified Metadata System (SUM) [8] could support the QRCA production. In case of processes already using AD, starting from the reference metadata, describing content and quality of statistical data produced and disseminated by Istat through the I.Stat Dissemination System, the traceability task (one of the SUM objectives), should be pursued retaining metadata processes. Implementation of the Metadata Hyperdimension [2] Q2014, Vienna

23 ETL process metadata may support the Technical checks dimension (dataset readability, convertibility and compliance with AD requested). The AD integration process is documented in SIM then, for the Integrability quality dimension, it is possible to reuse metadata to compute indicators describing the quality of the Linking variable and, in general, the quality of the record linkage procedures. Using unique and stable (over time) ID number for objects in SIM, Objects Alignment and Comparability indicators may be implemented comparing AD from different sources in SIM or AD with the main Statistical Registers. With respect to the latter, coverage indicators of the Completeness quality dimension, where possible, may be implemented too. Linking the same object in a dataset over time, using the SIM ID number, may produce Dynamic of objects indicators in the Time-related dimension. Implementation of the Data Hyperdimension Q2014, Vienna

24 many indicators may be implemented automatically! others are not derivable using interoperability it is the case of indicators of consistency checks (Accuracy dimension) for the implementation of which, an external intervention has to be considered for the check rules definition and for the AD source Relevance evaluation, concerning the compliance with the Istat requirements in terms of quality, timeliness, contents. the AD source users may provide this information insofar as they have an interest to share with others information on AD used. With this aim some “AD source users groups” are setting up in Istat for the most important data source holder. Epilogue Q2014, Vienna

25 THANKS!


Download ppt "Towards a more efficient system of administrative data management and quality evaluation to support statistics production in Istat Grazia Di Bella, Simone."

Similar presentations


Ads by Google