Some considerations on developing a DWH for SBS estimates Orietta Luzi – Mauro Masselli Istat - Italy march 2013.

Slides:

Advertisements

Similar presentations

Statistics NZs experience in using Administrative Data in an Integrated Programme of Economic Vince Galvin General Manager Strategy & Communications.

Advertisements

Innovation data collection: Advice from the Oslo Manual South East Asian Regional Workshop on Science, Technology and Innovation Statistics.

Innovation Surveys: Advice from the Oslo Manual South Asian Regional Workshop on Science, Technology and Innovation Statistics Kathmandu,

Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:

Module B-4: Processing ICT survey data TRAINING COURSE ON THE PRODUCTION OF STATISTICS ON THE INFORMATION ECONOMY Module B-4 Processing ICT Survey data.

Support for the Alignment of Albanian Statistics with EU standards Improvement of Annual GDP estimation by production side and the new NOE estimation Elirjeta.

Eurostat Secondary data: collection and use Presented by Arnout van Delden Methodologist Statistics Netherlands.

Deliverable 2.8: Outliers Gary Brown Office for National Statistics UK.

Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.

Quality Guidelines for statistical processes using administrative data European Conference on Quality in Official Statistics Q2014 Giovanna Brancato, Francesco.

March 2013 ESSnet DWH - Workshop IV DATA LINKING ASPECTS OF COMBINING DATA INCLUDING OPTIONS FOR VARIOUS HIERARCHIES (S-DWH CONTEXT)

UNECE Work Session on Statistical Data Editing Vienna April 2008 Topic ii – Editing Administrative Data and Combined Sources.

1 Editing Administrative Data and Combined Data Sources Introduction.

Pieter Vlag ESSnet DWH: business register. Outline Central role of the  statistical units,  population frame, which includes number of enterprises,

Trade and business statistics: use of administrative data Lunch Seminar Enrico Giovannini Italian National Statistical Institute (ISTAT) New York, February,

Seminar on Developing a Programme on Integrated Statistics in the Caribbean Saint Lucia The Components of an Integrated Business and International Statistics.

Role of editing and imputation in integration of sources for structural business statistics Svein Gåsemyr, Statistics Norway Svein Nordbotten, University.

Eurostat Repeated surveys. Presented by Eva Elvers Statistics Sweden.

Eurostat Statistical Data Editing and Imputation.

Combining administrative and survey data: potential benefits and impact on editing and imputation for a structural business survey UNECE Work Session on.

Carmela Pascucci – Istat - Italy Meeting of the Working Party on International Trade in Goods and Trade in Services Statistics (WPTGS) Linking business.

CZECH STATISTICAL OFFICE Na padesátém 81, CZ Praha 10, Czech Republic The use of administrative data sources (experience and challenges)

Rudi Seljak, Metka Zaletel Statistical Office of the Republic of Slovenia TAX DATA AS A MEANS FOR THE ESSENTIAL REDUCTION OF THE SHORT-TERM SURVEYS RESPONSE.

Use of survey (LFS) to evaluate the quality of census final data Expert Group Meeting on Censuses Using Registers Geneva, May 2012 Jari Nieminen.

12th Meeting of the Group of Experts on Business Registers

Integrating administrative and survey data in the new Italian system for SBS: quality issues O. Luzi, F. Oropallo, A. Puggioni, M. Di Zio, R. Sanzo Nurnberg,

Quality issues on the way from survey to administrative data: the case of SBS statistics of microenterprises in Slovakia Andrej Vallo, Andrea Bielakova.

IMPUTING MISSING ADMINISTRATIVE DATA FOR SHORT-TERM ENTERPRISE STATISTICS Pieter Vlag – Statistics Netherlands Joint work with DESTATIS, Statistics Estonia,

Assessing Quality for Integration Based Data M. Denk, W. Grossmann Institute for Scientific Computing.

Eurostat Overall design. Presented by Eva Elvers Statistics Sweden.

Deliverable 2.6: Selective Editing Hannah Finselbach 1 and Orietta Luzi 2 1 ONS, UK 2 ISTAT, Italy.

May 2012 ESSnet DWH - Workshop III BUSINESS REGISTER IN STATISTICS LITHUANIA Jurga Rukšėnaitė Chief specialist.

Copyright 2010, The World Bank Group. All Rights Reserved. Managing processes Core business of the NSO Part 2 Strengthening Statistics Produced in Collaboration.

The new multiple-source system for Italian Structural Business Statistics based on administrative and survey data Orietta Luzi, Ugo Guarnera, Paolo Righi.

ESSnet on Datawarehousing - the business register Pieter Vlag – Statistics Netherlands.

Revision Project of the Business Register (BR) and Business Statistics in September 2013 Tuula Viitaharju.

Cristina Casciano, Viviana De Giorgi, Filippo Oropallo Istat Division for Structural Business Statistics, Agriculture, Foreign Trade and Consumer Prices.

Use of Administrative Data Seminar on Developing a Programme on Integrated Statistics in support of the Implementation of the SNA for CARICOM countries.

Editing of linked micro files for statistics and research.

© Federal Statistical Office Germany, Division IB, Institute for Research and Development in Federal Statistics Sheet 1 Surveys, administrative data or.

The challenge of a mixed-mode design survey and new IT tools application: the case of the Italian Structure Earning Surveys Fabiana Rocci Stefania Cardinleschi.

Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.

Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.

Integrated Approach Processing Marie Brodeur Director General, Industry Statistics Branch, Statistics Canada St. Lucia February, 2014 SNA seminar in the.

Experience and response in developing countries: the twinning project with the Tunisian National Statistical Institute Monica Consalvi ISTAT, Division.

Multivariate selective editing via mixture models: first applications to Italian structural business surveys Orietta Luzi, Guarnera U., Silvestri F., Buglielli.

Eurostat Accuracy of Results of Statistical Matching Training Course «Statistical Matching» Rome, 6-8 November 2013 Marcello D’Orazio Dept. National Accounts.

Trade & Business Statistics Geert Bruinooge Statistics Netherlands.

1 Statistical business registers as a prerequisite for integrated economic statistics. By Olav Ljones Deputy Director General Statistics Norway

14-Sept-11 The EGR version 2: an improved way of sharing information on multinational enterprise groups.

ESSnet on Consistency of Concepts and applied Methods of Business and Trade-related Statistics: Statistical Units D. Filipponi – Istat (Italy) ________________________________________________________.

COMBINING SURVEY AND ADMINISTRATIVE DATA IN THE ITALIAN EU-SILC EXPERIENCE: POSITIVE AND CRITICAL ASPECTS National Institute of Statistics - Italy Claudio.

R&D statistics in Denmark organization of data collection, and dissemination of R&D statistics.

How to deal with quality aspects in estimating national results Annalisa Pallotti Short Term Expert Asa 3st Joint Workshop on Pesticides Indicators Valletta.

Administrative Data and Official Statistics Administrative Data and Official Statistics Principles and good practices Quality in Statistics: Administrative.

M O N T E N E G R O Negotiating Team for Accession of Montenegro to the European Union Working Group for Chapter 18 – Statistics Bilateral screening: Chapter.

Session topic (i) – Editing Administrative and Census data Discussants Orietta Luzi and Heather Wagstaff UNECE Worksession on Statistical Data Editing.

4-6 September 2013, Vilnius Quality in Statistics: Administrative Data and Official Statistics USING ADMINISTRATIVE DATA SOURCES IN OFFICIAL.

Dublin, april 2012 Role of Business Register in coordinated sampling

Quality Aspects and Approaches in Business Statistics

Prague EU-SILC Best Practice Workshop, 14th and 15th September 2017

Sample surveys versus business register evaluations:

Structural Business Statistics Data validation

ADMINISTRATIVE DATA IN ANNUAL BUSINESS STATISTICS OF LATVIA

Goals and objectives of Work package 2 of the ESSnet on Consistency of concepts and applied methods of business and trade-related statistics Norbert Rainer,

Organization of efficient Economic Surveys

Italian situation in the following areas:

ETS WG meeting 6-7 September 2006

Parallel Session: BR maintenance Quality in maintenance of a BR:

The Swedish survey on turnover in the service sector

Presentation transcript:

Some considerations on developing a DWH for SBS estimates Orietta Luzi – Mauro Masselli Istat - Italy march 2013

The rationale of DWH the complete use of all the information (survey and administrative data) we have on the whole or about the entire target population; to build up a platform in which we integrate data and processes (from capturing to integrating data, from checking data to estimating results to disseminating estimates). the advantages in cancellation of sampling errors from one side and process integration and standardization on the other, exceed the disadvantages due to increasing non sampling errors and the partial loss of control on administrative data

goals First step: To establish a common set of estimates (micro/macro) among SBS and NA on observed economy Second step: Integration of other surveys on business (structural – ICT,R&D, externalò trade….. and STS) Implications –Revision of sampling designs of SBS surveys –Revisions of production processes

Business Register BR central role as Selection List and “frame” The target population is identified with all the enterprises listed in Business Register. For each unit BR contains two kind of variables: – classification variables (NACE, legal Status, splits and joins, current status, etc..) –content variables (e.g. the total number of persons employed, subtotals of different kind of workers, labour costs, an estimation of turn over ….). – We assume that the classification variables and the variable “persons employed” and the implicit binary variable “existence of business” are by itself target variables and call them Z; they are kept by BR as they are and do not enter in any procedure of data treatment.

Target variables The target variables can be divided into two groups: A set of “basic variables” X* needed for the estimates required by the SBS - EU Regulation and by NA estimates ; The remaining variables Y* needed only for NA to be estimated conditionally to the first set

Sources the administrative sources: tax file, balance sheets, social security worker’s data, fiscal authority survey SBS surveys at moment, other structural business surveys in the next future

Administrative data How to asses the quality? Some results from essnet admin data Essentially: Definitions how much close are to SBS ones Data analysis »On overlapping data set »To identify biases analysis of distributions models on relationships between data sources

Administrative data Advantages: costs, completness Disadvantages: stability over time – data can be changed for internal decision of the producing administration »Operational definitions »Data indicators from overlapping Agreements with producers data sets Redisign sample surveys

From the collected variables to the target ones For each enterprise, some of the X* variables may exist in one or more of the S sources in different combinations, according to the dimension, the social security rules, the fiscal status etc. only for the sampled respondents units we have a complete set of target variables and these variables are set equal to X*. The variables Ai reported in source “i” may coincide or may approximate the corresponding X*; in the second case it could be possible to “correct” some of them obtaining a set of more precise Xi “estimate” of X*. number of sourcesbusiness ,7% ,6% ,1% ,1% ,2% no source ,3% total ,0%

From A i to X i x ij =a ij in case of “good” fitting x ij = f(a ij …..) otherwise,

The matrix X

BR Z & ID codes group of business Number of Variables X* SBS survey All the X* Source 2 Source3Source4Source5 M1K 1 = K (all) M2K 2 < K M3K 3 < K M4K 4 < K M5K 5 < K …. ………… ………..……….……………. MmKm < K No source 0 The matrix X* by establishing a hierarchy between sources

Macro-operators Establishing target populationList from Business Register and variables Z Establishing target variables X*Reconciliation between NA and SBS operative definitions Establishing A i ……..A S (collected variables) Analysis of data and definitions of the different sources A i with respect to the definitions of X*; the purpose is to evaluate the similarity of definitions in order: (i) to establish a hierarchy between the sources; (ii) to identify the correction to variables A From variables A to variables X;where it is necessary and possible, correction of A; the variables a ij are transformed into x ij by a “function”: x ij =a ij in case of “good” fitting or x ij = F(a ij …..) in case of correction establishing variable X i Outlier detection, selective editing Establishing variable X*Hierarchy between sources/variables

Donor methods Randomly By models Eg the projection estimator By calculating a new variable to be used as a distance between donor and recipient Latent variables model In all the methods we can use ex ante domains or can identify the appropriate variables to build up the donor domains

Establishing coherence: modify data of source i by data of source j Change some var X i Check the impact on the other var X i Modify other var X i asses X i E&I rules Outliers detection and removal

A simplified example Source i Persons employed > Turnover value added labour costs Intermediate costs »Services Value added/persons employed ? BR Persons employed and labour costs

Sources Hierarchy Ex ante - Based on How definitions of source i is close to SBS ones »BR/social security data »SBS sample survey »Balance sheets »Fiscal authority survey »Tax files Prevoius and current data analysis

Correct A data to obtain X data x i,k = f(a i,k,a i,m …) By data analysis on overlapping data sets By definitions Other considerations

How to fill in the matrix X* to obtain the matrix X** except for the group M1, survey respondents, in all the other cases we have a number of X* variable smaller than K (the needed target variables). So for obtaining the estimates we can consider two options: a massive imputation of missing values at micro level an estimation of missing X* at macro level

BR Survey Micro integration Z, X(1), X(2) …X(S) Selection of X*; E&I; coherence among different sources Micro Z X* Massive imputation Micro Z X**Y* SBS estimation Micro data treatment in the single sources admin sources Estimation of variables Y* NA estimates Micro integration Z, A(1) A(2)…A(S) Calculating X(1)…X(S) E&I; coherence among different sources on imputed units Micro NA treatment Massive imputation micro approach

BR Surve y Micro integration Z, X(1), X(2) …X(S) Selection of X*; outliers detection; Micro Z X* Summing up by domains; inconsistencie s clean up Domain D estimates X**Y* SBS estimates Micro data treatment in the single sources admin sources Estimation of variables Y* NA estimates Micro integration Z, A(1) A(2)…A(S) Calculating X(1)…X(S) macro approach

Cross section and longitudinal approach At moment the cross-sectional approach. However the longitudinal approach has the significant features using “variations” is the logic adopted by NA estimating procedures we have “more information” to dealing with. implication all the functions regarding the data control and imputation procedures could be developed considering both cross sectional and longitudinal “rules

Metadata Generally speaking, we can roughly divide them in three broad sets:  Metadata needed to manage the data  the information related to process and procedures,  the wider documentation related to the different topics in developing the DWH.  Sustainability different tools for managing