Presentation on theme: "Better data quality through global data and metadata sharing Agne Bikauskaite and Håkan Linden Eurostat European Conference on Quality in Official Statistics."— Presentation transcript:
Better data quality through global data and metadata sharing Agne Bikauskaite and Håkan Linden Eurostat European Conference on Quality in Official Statistics (Q2014) Vienna, 3-5 June 2014
Outline 1.Context 2.A data sharing model 3.The necessary preconditions 4.Implementing Eurostat's data sharing strategy 5.Conclusions and outlook
Context General objectives Reduce reporting burden on NSIs More efficient use of resources in International Organisation (IO) Ensure high quality and consistency of data of official statistics Improve global data exchange and dissemination
A data sharing model European statistics: From national to Eurostat EU Member state Data Validation Eurostat
A data sharing model EU countries OECD countries (non-EU countries only) Other countries (non-OECD countries only) Eurostat - ECB OECD IMF, UN, WB, ILO, BIS, other IOs USERSUSERS USERSUSERS Eurostat as international hub for European statistics
The necessary pre-conditions Internationally agreed technical and statistical standards Internationally agreed data structures Maintenance agreements Internationally agreed data validation Streamlined data exchange processes
It consists of technical and statistical standards, guidelines, an IT service infrastructure and IT tools. SDMX provides technical/statistical standards new exchange modes (hubs) clear rules and responsibilities SDMX ISO IS 17369 Statistical Data and Metadata Exchange (SDMX)
Organisation scheme Concepts Code lists Concept Schemes Provision Agreement SDMX describes the data and metadata exchange DSDs maintainer SDMX Registry
Describing the data exchange Who? What? When? Who? Where? How? What?
Cross-domain concepts and code lists Statistical subject-matter domains Metadata common vocabulary Recommendations to harmonise implementations Organisation 1Organisation 2Organisation 3 interoperability Content-Oriented guidelines
Code lists describe dimensions in data tables, giving a meaning to the data. Code lists are based on: official statistical classifications such as NACE, NUTS, ISCO, etc. The ESS and SDMX Content Oriented Guidelines domain specific codifications A standard code list is a code list already harmonised Standard code lists should be used all along the statistical business process: data design, collection, aggregation, dissemination, exchange, archiving. Implementing Eurostat's data sharing strategy Standardisation of structural metadata
Implementing Eurostat's data sharing strategy Recommendations for the SCL creation RECOMMENDED RULESESSSDMXCOMMENTS Input: Official information Coding A-Z + 0-9 + - + _A-Z + 0-9 + _ In SDMX “–“ (dash) is not allowed (to avoid confusion with operator "minus") Codes starting with letterWith some exceptions Meaningful coding Less homogeneity in coding in SDMX (due to involvement of several different partners) Aggregates are possible To be used all along the statistical business process May be referenced by several statistical concepts Based on clear guidelines Maintenance agency ESS: Eurostat Unit B5 SDMX: Statistical Working Group (SWG) Versioning systemIn future registries Generic concept In SDMX is special CL for generic codes In ESS generic codes are implemented in each SCL when it is needed
Implementing Eurostat's data sharing strategy SDMX standards into ESS structural metadata In purpose to improve quality of the data comparability and clarity is needed: To use identical SCLs in the ESS and in the SDMX To transpose the SDMX guidelines into the ESS code lists To adapt the ESS standard codes into the SDMX DSDs
Implementing Eurostat's data sharing strategy Overview of the ESS SCLs 504 ESS CLs 194 ESS SCLs released in Ramon 12 fully SDMX compliant 110 SDMX compliant (except Generic codes)
Implementing Eurostat's data sharing strategy Standardisation of Reference Metadata ESMS Euro SDMX Metadata Structure ESQRS ESS Standard for Quality Reports Structure EPMS Eurostat Process Metadata Structure
WASTE (end of life vehicles, packaging, electronic waste) WINE FARM STRUCTURE MIP STATISTICS HICP/ Compliance monitoring EHIS (Education, health and social protection) R&D (CIS 2012) Annual crops PRAG ESAW AES (Education, Science and Culture) LCI (Labour Cost Index) INFOSOC (Information Society) BUSINESS REGISTER HICP LFS-Q, LFS-A EU-SILC FATS STS (Short Term Statistics) WASTE AEI (Pesticides) EDUCAT JVC (Job Vacancy Stats) PRODCOM EXTERNAL TRADE (3rd countries) COSAEA URBANREG R&D TOURISM PERMANENT CROPS CENSUS HOUSING PRICES HPS Over 30 Eurostat domains are in various phases of ESS Reference metadata standardisation. This concerns about 35% of all eligible Eurostat processes. Implementing Eurostat's Reference metadata sharing strategy
Implementing Eurostat's data sharing strategy The Eurostat established methodology
Implementing Eurostat's data sharing strategy in ESS
Implementing Eurostat's data sharing strategy Development of the technical infrastructure Key components: SDMX Registries The Euro-SDMX Registry The Global SDMX Registry SDMX Reference Infrastructure (SDMX-RI)
Implementing Eurostat's data sharing strategy What is the EuroSDMX Registry(SER)? Eurostat's implementation of the SDMX Registry specifications as published by the SDMX initiative sdmx.org. sdmx.org Based on SDMX 2.1 (as published on April 2011) Also capable of importing and exporting SDMX 2.0 artefacts. Allows browsing, searching, editing and subscribing to artefacts. Advanced access control mechanism for distributed maintenance of artefacts controlling also their visibility.
Home page Most recent items Access to the content of the Registry by type Access to the content of the Registry text search Access to the content of the Registry advanced search
Conclusions International data co-operation improves the production of accurate, comparable and coherent statistics; SDMX promotes an incremental movement toward the data and metadata sharing model; The increasing use of SDMX based statistical standards improves the quality of the underlying statistical processes; The SDMX technical standards pave the ways for simplified exchange and dissemination processes helping to improve also timeliness and accessibility; Statistical integration needs to go hand-in-hand with technical integration and standardisation.
Outlook Much more global data and metadata sharing in the years to come; Common data validation and processing procedures are required (from structural validation to content information validation); Better metadata driven statistics production systems: the use of standards throughout the processes in combination with common metadata registries ; Better harmonised international reference metadata frameworks and templates; Broadening the scope of SDMX (versioning of codes, disabling of dimensions, other formats like CSV, flat files etc.); Interoperability between information models (GSIM, SDMX, DDI etc.).