Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia

Slides:



Advertisements
Similar presentations
Statistics 2020 and Platform Approach Te Käpehu Whetü May 2011.
Advertisements

Input Data Warehousing Canada’s Experience with Establishment Level Information Presentation to the Third International Conference on Establishment Statistics.
Implementation of the CoP in SLOVENIA Cooperation with data users Genovefa RUŽIĆ Deputy Director-General.
Case Studies Slovenia Julija Kutin METIS Workshop on the Statistical Business Process and Case.
STANDARD ERRORS PRESENTATION AND DISEMINATION AT THE STATISTICAL OFFICE OF THE REPUBLIC OF SLOVENIA Rudi Seljak Statistical Office of the Republic of Slovenia.
Components and Architecture CS 543 – Data Warehousing.
Modernisation of Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS Workshop on Modernisation of Statistical Production Geneva, 15–17.
by Ha Do Statistical Standard Methodology and ITC Department
United Nations Economic Commission for Europe Statistical Division Applying the GSBPM to Business Register Management Steven Vale UNECE
Metadata driven application for aggregation and tabular protection Andreja Smukavec SURS.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
M ETADATA OF NATIONAL STATISTICAL OFFICES B ELARUS, R USSIA AND K AZAKHSTAN Miroslava Brchanova, Moscow, October, 2014.
Development of metadata in the National Statistical Institute of Spain Work Session on Statistical Metadata Genève, 6-8 May-2013 Ana Isabel Sánchez-Luengo.
Principles of Information Systems, Sixth Edition Systems Design, Implementation, Maintenance, and Review Chapter 13.
The Adoption of METIS GSBPM in Statistics Denmark.
Population census micro data for research: the case of Slovenia Danilo Dolenc Statistical Office of the Republic of Slovenia Ljubljana, First Regional.
Current and Future Applications of the Generic Statistical Business Process Model at Statistics Canada Laurie Reedman and Claude Julien May 5, 2010.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Lisbone, March ALBANIAN METADATA AlbMeta Prepared by INSTAT Working Group.
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic.
Metadata driven application for data processing – from local toward global solution Rudi Seljak Statistical Office of the Republic of Slovenia.
Statistics New Zealand’s End-to-End Metadata Life-Cycle ”Creating a New Business Model for a National Statistical Office if the 21 st Century” Gary Dunnet.
United Nations Economic Commission for Europe Statistical Division Mapping Data Production Processes to the GSBPM Steven Vale UNECE
Sampling Error Estimation – SORS practice Rudi Seljak, Petra Blažič Statistical Office of the Republic of Slovenia.
Electronic data collection System in CSB of Latvia By Karlis Zeila, Vice President, CSB of Latvia IT DG meeting, October , Eurostat.
CBS-SSB STATISTICS NETHERLANDS – STATISTICS NORWAY Work Session on Statistical Data Editing Oslo, Norway, September 2012 Jeroen Pannekoek and Li-Chun.
United Nations Economic Commission for Europe Statistical Division High-Level Group Achievements and Plans Steven Vale UNECE
Outlining a Process Model for Editing With Quality Indicators Pauli Ollila (part 1) Outi Ahti-Miettinen (part 2) Statistics Finland.
© Federal Statistical Office, Institute for Research and Development in Federal Statistics, Elmar Wein Federal Statistical Office Concepts, materials and.
Pilot Census in Poland Some Quality Aspects Geneva, 7-9 July 2010 Janusz Dygaszewicz Central Statistical Office POLAND.
Developing and applying business process models in practice Statistics Norway Jenny Linnerud and Anne Gro Hustoft.
The Application for Statistical Processing at SURS Andreja Smukavec, SURS Rudi Seljak, SURS UNECE Statistical Data Confidentiality Work Session Helsinki,
Principles of Information Systems, Sixth Edition 1 Systems Design, Implementation, Maintenance, and Review Chapter 13.
1 Towards a common statistical enterprise architecture Ongoing process reengineering at Statistics Sweden Service Oriented Architecture – SOA Sharing of.
Open GSBPM compliant data processing system in Statistics Estonia (VAIS) 2011 MSIS Conference Maia Ennok Head of Data Warehouse Service Data Processing.
Recent development in the metadata area at Statistics Sweden Klas Blomqvist
Integrated metadata systems History Status Vision Roadmap
RECENT DEVELOPMENT OF SORS METADATA REPOSITORIES FOR FASTER AND MORE TRANSPARENT PRODUCTION PROCESS Work Session on Statistical Metadata 9-11 February.
QUALITY ASSESSMENT OF THE REGISTER-BASED SLOVENIAN CENSUS 2011 Rudi Seljak, Apolonija Flander Oblak Statistical Office of the Republic of Slovenia.
5.8 Finalise data files 5.6 Calculate weights Price index for legal services Quality Management / Metadata Management Specify Needs Design Build CollectProcessAnalyse.
Elaborating on the Business Architecture of SN Robbert Renssen Statistics Netherlands Standard Process Steps.
The business process models and quality issues at the Hungarian Central Statistical Office (HCSO) Mr. Csaba Ábry, HCSO, Methodological Department Geneva,
Introduction to Quality Management Frameworks Eurostat, Luxembourg, January 2016 Process quality Dr Johanna Laiho-Kauranne.
Introduction to Statistics Estonia Study visit of the State Statistical Service of Ukraine on Dissemination of Statistical Information and related themes.
The Role of service Granularity in Successful CSPA Realization Zvone Klun, Tomaž Špeh Geneve, 22 June 2016.
MANAGEMENT OF STATISTICAL PRODUCTION PROCESS METADATA IN ISIS
Towards connecting geospatial information and statistical standards in statistical production: two cases from Statistics Finland Workshop on Integrating.
THE BNSI EXPERIENCE IN METADATA COLLECTION AND ORGANIZATION
UNITED NATIONS ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing April 2017 The Hague,
Rudi Seljak, Aleš Krajnc
The status of metadata standards and ModernStats models in SURS
Census Technology: Processing architecture and data analysis
Guidelines on the use of estimation methods for the integration of administrative sources DIME/ITDG meeting 2018/02/22.
Survey phases, survey errors and quality control system
Generic Statistical Business Process Model (GSBPM)
YTY − an integrated production system for business statistics
Survey phases, survey errors and quality control system
Metadata in the modernization of statistical production at Statistics Canada Carmen Greenough June 2, 2014.
Integrated Statistical Information System (ISIS) in Croatia By Maja Ledić Blažević, Senior Advisor, Research & Development Dept. and Branka Cimermanović,
Evaluation & Experiences ‘YTY-System’ Statistics Finland
Vygandas Norkus Deputy Director General October 2009, IT DG
Data validation in Statistical Office of the Republic of Serbia
Metadata The metadata contains
Jeroen Pannekoek, Sander Scholtus and Mark van der Loo
Mapping Data Production Processes to the GSBPM
Metadata used throughout statistics production
The role of metadata in census data dissemination
GSBPM AND ISO AS QUALITY MANAGEMENT SYSTEM TOOLS: AZERBAIJAN EXPERIENCE Yusif Yusifov, Deputy Chairman of the State Statistical Committee of the Republic.
Technical Coordination Group, Zagreb, Croatia, 26 January 2018
Presentation transcript:

Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia Developing and using common tools for processing statistical micro data at SORS Tomaž Špeh, Rudi Seljak Statistical Office of the Republic of Slovenia

Content of the presentation Problem description Design and implementation guidelines Example: common tools for statistical micro data editing Other common tools for processing statistical data Current state, future plans and lessons learned Discussion

Problem description Our statistical IT landscapes are becoming more and more complex, containing systems developed for decades containing multiple vendor components and technologies. Current and future statistical business needs require flexibility and agility in support of IT. Statistical micro data processing has always been one of the most important parts of the statistical process; however, due to the complexity of the related methodology, general acceptable solutions are rare. The main aim is to create a modular platform that enables statisticians to design the process and process statistical micro data. Current state and experiences gained during developing and using common tools for processing statistical micro data as well as how common tolls are used in the data editing process will be presented.

Data processing target workflow Statistician Methodologist Statistician Statistician Request for data processing Define statistical methodology Choose and connect building blocks Process Repository of building blocks

Design and implementation guidelines Standardization: solution enforces standards Flexibility: incremental development, agile maintenance Reuse: build processes from reusable, well defined and well tested building blocks, layered architecture model Resources: reduction of resources, less manual work Reliability: manual work is error prone Separation of design and execution Process quality and product quality Platform/ technology independent solutions GSBPM aware

Streamlining statistical micro data editing processing AGREGATIONS DISSEMINATION PREPARATION OF DISSEMINATION TABLES KLASJE Input database STATISTICAL SURVEY PLANNING METADATA EDITING Statistical register Data warehouse Macro Electronic releases DATA COLLECTION DATA INTEGRATION CUSTOMERS (SLOVENIA, EU ...) DATA PREPARATION STATISTICAL PROCESSING SOURCES Macrodata – standard tables customers Latest data customers Tailor–made tables ARCHIVES OFFICIAL ARCHIVES 2,3 data and processes ARCHIVE 1 P O Ž A R N A S T E N A SCHEME OF TARGET DATA FLOW AT SORS ARCHIVE 4 Micro DOCUMENTATION EDITING Dissemi- nation server METIS Documentation According to templates SECONDARY DATA PRIMARY Printed publications Organizations International reporting Microdata for researchers MICRO EDITING SEASONAL ADJUSTEMENT AND MACRO DATA EDITING USERS Observation units FRAME AND SAMPLE CREATION OF STAT. REGISTER

Creation of the input database Data integration, followed by the data editing is the crucial part of the process. The quality of the final data to a large extent depends on this part of the process. … Missing data Field survey data X … Inconsistent data X X Other SORS data … X X Admin data X X X

Data editing Main steps in the process: Logical checks for a particular data set Outlier detection for a particular data source Corrections and missing data imputation for a particular data source Data integration and derived variables calculation Logical checks on integrated data Additional corrections in the particular data source Integration

Metadata driven, content independent and reusable building blocks checks Corrections, imputations checks Corrections, imputations Integration checks Corrections, imputations Integrated database Integrated database checks From the implementation point of view

Checks – metadata table Processing

Corrections – metadata table Individual data corrections Heating costs Inter-household cash transfer Systematic data corrections Apples produced for own consumption

Imputations – metadata table 24 different methods with different parameterizations can be used at the moment (hot-deck, regression, logical imputations, etc.) Mortgage installment Parameterization

Data editing – software application The application is designed as a Metadata Driven System (MDD). All the information referring to a specific survey execution are provided through the metadata tables. Currently the application uses the following software: SAS + Banff as stored procedures SAS EG as standard interface ORACLE (metadata repository) ORACLE (data storage) * Plan: Technology and platform independent environment for the automated execution of processes, other development environments

Existing MDD tools Logical checks Corrections Imputations Aggregation Standard error estimation Tabulation

Planed MDD tools Sampling Weighting Calculation of quality indicators Disclosure control Macro editing

Current state – main advantages The subject-matter personnel can run the process independently from the IT sector. All information about the data processing is transparently available through the metadata tables. The process can be easily adjusted for the different executions of the survey. Every change of the data in the process is systematically flagged  easier calculation of quality indicators and production of the quality report.

Management of the system Adhoc procedure for preparing input data Adhoc procedure for preparing output data input data output data Micro data database Edited data Production metadata repository Surveys and instances Standard SAS application All the processes are programmed as sas (+Banff) macros. The user can run and control the processes through stored process interface.

Implementation approach Design & development of editing system Automated data editing system based on predefined checks (Fellegi, Holt) Complex implementation and maintenance Statisticians want more control over edited data Semi automated system where statisticians control checks and edit rules through interactive metadata Development of improved system for metadata management Development of architecture for supporting the execution of statistical business processes Solutions for visualisation of the workflow

Challenges for the future To improve the procedure of the metadata management; at the moment there is a high risk of syntax errors Improve the management of the system, especially running of different parts of the process Additional MDD building blocks should be developed Platform independent components and integration Enable separation of process design from process execution

Conclusions Current solution for processing statistical micro data is proven to work satisfactorily for the processing of several micro data collections (Population Census, Agriculture Census, EU SILC, SES). Development of improved system for metadata management is the next phase. During the development of the applications other (international) solutions will be considered: sharing IT tools, building blocks harmonizing IT infrastructure Metadata definitions and their exchange