Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European.

Slides:



Advertisements
Similar presentations
Status on the Mapping of Metadata Standards
Advertisements

SDMX in the Vietnam Ministry of Planning and Investment - A Data Model to Manage Metadata and Data ETV2 Component 5 – Facilitating better decision-making.
SDMX and DDI: How Do They Fit Together in Practical Terms? Arofan Gregory The Open Data Foundation European DDI User’s Group 2011 Gothenburg, Sweden.
The Data Cube Vocabulary: Statistics in the Web of Linked Data Arofan Gregory Open Data Foundation WICS, Geneva, 5-7 May 2015.
Background Data validation, a critical issue for the E.S.S.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
WP.5 - DDI-SDMX Integration
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
NSI 1 Collect Process AnalyseDisseminate Survey A Survey B Historically statistical organisations have produced specialised business processes and IT.
Survey Data Management and Combined use of DDI and SDMX DDI and SDMX use case Labor Force Statistics.
Overview of SDMX: Statistical Data and Metadata eXchange Technical and Content Standards for Statistical Data Ann McPhail, Division Chief Statistics Department,
SDMX and DDI Working Together Technical Workshop 5-7 June 2013
3 rd Annual European DDI Users Group Meeting, 5-6 December 2011 The Ongoing Work for a Technical Vocabulary of DDI and SDMX Terms Marco Pellegrino Eurostat.
SDMX Training Bank Indonesia September 2015
Restricted Daejeon, April An SDMX based unified data catalogue (UDC) MSIS – Meeting on the Management of Statistical Information Systems 1.
UNECE METIS work session on statistical metadata Luxembourg, 9 to 11 April SDMX as a source of standardised terminology: MCV and cross-domain concepts.
1 Annual National Accounts  1. Situation of OECD annual national accounts database  2. New features of the joint OECD-Eurostat questionnaire  3. COFOG2.
Eurostat Unit B3 – IT and standards for data and metadata exchange SDMX Basics Training – 2012 IT architectures for data exchange SDMX-RI and the Hub approach.
Eurostat – Directorate B: Corporate statistical and IT services SDMX Basics Training – 2013 SDMX basics Marco Pellegrino Eurostat, Directorate B.
DDI-RDF Leveraging the DDI Model for the Linked Data Web.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
13-Jul-07 Implementation of SDMX for data and metadata exchange Balance of Payments Working Group 2-3 April 2012 Daniel Suranyi Eurostat B5 Management.
DDI Discovery: An Overview of Current RDF Vocabularies Arofan Gregory Metadata Technologies NA Joachim Wackerow GESIS.
Basics David Barraclough OECD SDMX Coordinator
Eurostat achievements and challenges Emanuele Baldacci, Director European Commission - Eurostat Director Methodology; Corporate statistical.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
Eurostat 4. SDMX: Main objects for data exchange 1 Raynald Palmieri Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October.
1 Integration of the Eurostat and ESS Metadata Systems A. Götzfried Head of Unit B6 Eurostat.
SDMX IT Tools Introduction
Metadata Working Group Jean HELLER EUROSTAT Directorate A: Statistical Information System Unit A-3: Reference data bases.
2.An overview of SDMX (What is SDMX? Part I) 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
1 SDMX Global Conference September 2015 SDMX into the future VTL (Validation and Transformation Language) A new technical standard for enhancing.
SDMX IT Tools SDMX use in practice in NA
Eurostat 1.SDMX: Background and purpose 1 Edward Cook Eurostat Unit B5: “Central data and metadata services” SDMX Basics course, October 2015.
United Nations Economic Commission for Europe Statistical Division Standards-based Modernization of Official Statistics Steven Vale UNECE
OECD Expert Group on Statistical Data and Metadata Exchange (Geneva, May 2007) Update on technical standards, guidelines and tools Metadata Common.
Statistical Data and Metadata Exchange SDMX Metadata Common Vocabulary Status of project and issues ( ) Marco Pellegrino Eurostat
1 Joint UNECE/EUROSTAT/OECD METIS Work Session (Geneva, March 2010) The On-Going Review of the SDMX Technical Specifications Marco Pellegrino, Håkan.
United Nations Economic Commission for Europe Statistical Division GSBPM and Other Standards Steven Vale UNECE
Eurostat November 2015 Eurostat Unit B3 – IT and standards for data and metadata exchange Jean-Francois LEBLANC Christian SEBASTIAN SDMX IT Tools SDMX.
Eurostat Sharing data validation services Item 5.1 of the agenda.
SDMX Basics course, March 2016 Eurostat SDMX Basics course, March Introducing the Roadmap Marco Pellegrino Eurostat Unit B5: “Data and.
ΕΚΤ Access to Knowledge ΕΚΤ Access to Knowledge R&D Statistics Information System: An Interoperability Tail between CERIF and SDMX Dimitris Karaiskos Dimitrios.
IAEA International Atomic Energy Agency Implementing SDMX for Energy Domain: From Discussion to Actual Implementation and Testing Andrii Gritsevskyi Oslo.
Progress Update MSIS: Bratislava, April 2005
Interoperable data formats: SDMX
SDMX Information Model
The Re3gistry software and the INSPIRE Registry
SDMX: Enabling World Bank to automate data ingestion
SDMX: A brief introduction
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
11. The future of SDMX Introducing the SDMX Roadmap 2020
2. An overview of SDMX (What is SDMX? Part I)
The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems
2. An overview of SDMX (What is SDMX? Part I)
LOD reference architecture
SDMX in the S-DWH Layered Architecture
Statistical Information Technology
RAMON Re-engineering An Update
Prepared by Peter Boško, Luxembourg June 2012
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Presentation to SISAI Luxembourg, 12 June 2012
Item 7.3 (b) SDMX for UOE data collection
SDMX Implementation The National Accounts use case
Generic Statistical Information Model (GSIM)
SDMX Global Conference Francesco Rizzo – ISTAT, Italy
Palestinian Central Bureau of Statistics
SDMX training Francesco Rizzo June 2018
Presentation transcript:

Eurostat Standardisation within the ESS: SDMX present and future Luxembourg, October 2015 Marco Pellegrino Eurostat, Statistical Office of the European Union

Eurostat Outline Evolution of SDMX Standards integration - Examples Opportunities and challenges - All good standards change 2

Eurostat 3 A model to describe statistical data and metadata A standard for automated communication from machine to machine A technology supporting standardised IT tools A common language for statistics Statisticians agree to use a common description for data and metadata The data exchange process is then driven by this common description Data descriptions are made available for everybody who wants to understand and reuse the data SDMX provides

Eurostat Why do we need a model? To define and describe statistical processes in a coherent way To standardize process terminology To compare and benchmark processes within and between organisations To identify synergies between processes To inform decisions on systems architectures and organisation of resources 4

Eurostat 5 The SDMX Components  Describe statistics in a standard way  Objects and their relationships  Data Structure Definition (DSD), Concepts, Code List  Central management and standard access  SDMX Registry, SDMX Web Services  Cross Domain Concepts  Cross Domain Code Lists  Statistical Domains  Metadata Common Vocabulary  Push  Provider generates and sends file to receiver  Pull  Provider opens web service to data  Receiver downloads regularly  Hub  Special case of pull: receiver downloads on end user request

Eurostat The same information is needed for exchange between different steps in a statistical production process. The use of SDMX throughout the process, in combination with a metadata registry (central storage of definitions, classifications, etc.) makes it more efficient and coherent to implement changes, e.g. in definitions Metadata-driven systems 6 Broadening the scope of SDMX

Eurostat  Standard metadata layer for the description and use of data and metadata throughout the process 7 Broadening the scope of SDMX

Eurostat GSBPM and SDMX: towards a more complete picture 8

Eurostat SDMX and standards integration SDMX promotes an incremental movement towards a data and metadata sharing model with the production of comparable and accurate statistics. The increasing use of SDMX: a) improves the quality of the statistical process b) enables simplified exchange and dissemination processes, improving timeliness and accessibility Statistical integration goes hand-in-hand with technical integration and standardisation. 9

Eurostat Building bridges 10 …not walls

Eurostat 11 Building bridges

Eurostat SDMX and Linked Open Data Based on RDF - Resource Description Framework - a family of specifications published by W3C allowing for machine-actionable, semantically rich linking of things found on the Web. Main RDF vocabulary for statistical data: → Data Cube Vocabulary Simplified version of the SDMX model covering data structures 12 Building bridges

SDMX Data Structure Definition RDF Data Cube Vocabulary SDMX Data Set structured by dimensionality

Latest Version The RDF Data Cube Vocabulary W3C Recommendation 16 January 2014 This version: 14

5 star-schema of Linked Open Data ★ Make your stuff available on the Web (whatever format) under an open license. ★★ Make it available as structured data (e.g., Excel instead of image scan of a table). ★★★ Use non-proprietary formats (e.g., CSV instead of Excel). ★★★★ Use URIs to denote things, so that people can point at your stuff. ★★★★★ Link your data to other data to provide context. Slide 15

The Data Cube Vocabul ary DataCube is a W3C recommendation, and has gained some momentum Data producers using SDMX can also publish in the Data Cube Vocabulary (DCV) As with any other RDF publication, the applications processing the RDF must understand the DCV data model to make sense of the data Therefore applications wishing to process any additional information added to the DCV triples need to understand the model of the attached data 16

The SDMX Perspective If you are using SDMX today (GESMES or XML), what does this mean? Most DataCube implementation today is being done by organizations that don’t use SDMX-ML For statistical organisations there is an increasing interest in RDF and there is a need to be able to integrate DataCube as an alternative query and delivery sourced originally from existing SDMX- based systems 17

SDMX and RDF: Scenario  RDF File Statistical Dissemination System Data Cube Writer SDMX-ML File SDMX-ML to RDF Transformer Either Or Using SDMX Component Architecture SDMX Writer Interface 18

Scenario  : Publish RDF triples as flat files Publish to a server exposed to the web Packaged in a meaningful way using named graphs Data by data set Structures (all in one file or codelists and concepts in one file and DSDs in another file Considerations Needs to be kept up to date (either republish as a replace or as an incremental update) Simple Approach but not easily queryable (discovery and linking tools typically work with SPARQL endpoints) 19

SDMX and RDF: Scenario  Triple Store (DataCube) Statistical Dissemination System RDF Service SPARQL SDMX-ML File SDMX-ML File to RDF Transformer Either Or Data Cube Writer SDMX Writer Interface 20

Scenario  : Populate a SPARQL endpoint Deploy RDF triple in a “triple store” Dedicated database system that natively understands SPARQL queries Supported by many RDF tools, some supporting a variety of flavours of RDF (XML, TURTLE, N-Triples) Data could be updated at the level of dataflow Considerations Good support for linking (the reason for LOD) Good support for cross dataflow queries Data with some common dimensions 21

Considerations If RDF is treated as a completely separate syntax, then the burden of data management is doubled If it is treated as a delivery format (just another data writer) then it is relatively easy to implement Up-front cost for tools development Low ongoing maintenance The benefits of RDF-based technology are realized in a cost-effective manner 22

Eurostat Data validation  “Technical” - Covered by SDMX today - Format Check (SDMX-ML) - Codes exist (SDMX DSD) - Codes used correctly (Dataflow & Constraint)  “Statistical Domain” - Not yet covered by SDMX (VTL) - Value check - Time series - Revisions - Validation expressions Building bridges

Eurostat VTL: Validation and Transformation Language 24 Standard language for defining validation and transformation rules Validation (now) Transformation (partially now, to be enriched at a later stage) Main goals Define and preserve validation and transformation rules Exchange and share rules Apply rules in industrialized processes Apply to several standards (e.g. SDMX, DDI, GSIM) thanks to a generic information model

Eurostat DDI is split into 2 branches: DDI-Codebook (DDI-C): DDI-C is a light-weight version of the standard, intended primarily to document simple survey data. DDI-Lifecycle (DDI-L or DDI 3+): DDI-L is designed to document and manage data across the entire life cycle, from conceptualization to data publication and analysis and beyond. DDI-L is currently being evaluated in several statistical organizations across the world. The DDI Lifecycle standard provides a data model for describing surveys in a very detailed fashion using XML. This can support many parts of the process of survey management particularly in the case of households surveys. E.g. exchange between question banks and data collection applications, generation of collection instruments, … 25 DDI: The Data Documentation Initiative

Eurostat  DDI: The DDI data lifecycle model 26

Eurostat SDMX and DDI SDMX can provide: Metadata describing the structure of dimensional data Stand-alone metadata sets (“reference metadata”) Formats for dimensional data A model of data reporting and dissemination Standard registry interfaces, providing a catalogue of resources Guidelines for deploying standard web services A way of describing statistical processes 27  DDI Lifecycle can provide a very detailed set of metadata, covering: Surveys and processing of microdata Structure of data files, including hierarchical files and complex relationships Archiving of data files and their metadata Tabulation and processing of data into tables Link between microdata variables and resulting aggregates Building bridges

Eurostat SDMX and DDI: similarities and differences Both standards use a similar model for identifiable, versionable and maintainable artefacts Both standards use “schemes”, as packages for lists of items, and XML “schemas” Both standards are designed to support reuse DDI has much more detailed metadata at the level of the study domain, and provides more complete descriptions of the processing of data SDMX provides more architectural components to support registration, reporting/collecting and exchange, and has a solid information model 28

29

Other relevant standards Geospatial standards DDI SDMX GSIM Conceptual model Implementation standards 30

Eurostat Opportunities and challenges SDMX is interacting well with other standards (GSIM, DDI, RDF Linked Open Data, JSON) and this “complementarity” opens us new perspectives for the innovation of statistical processes. Common data validation and processing procedures are required (from structural validation to content). Better metadata-driven statistical production systems, with the use of standards throughout the processes in combination with a metadata registry. Better maintenance and developments of SDMX (e.g. support to use cases, new functions, more formats, etc.) using the wealth of its Information Model. 31

Eurostat All good standards change 32 September 2004April 2011November 2005 Version 2.0 SDMX-EDI SDMX-ML SDMX Registry Version 2.0 SDMX-EDI SDMX-ML SDMX Registry Version 1.0 GESMES/TS Version 1.0 GESMES/TS Too much change may discourage adoption But… not giving users the functionalities they want would also discourage adoption

Eurostat Where do we want SDMX to be, in 2020? “Would you tell me, please, which way I ought to go from here?” “That depends a good deal on where you want to get to,” said the Cat. “I don’t much care where–” said Alice. “Then it doesn’t matter which way you go,” said the Cat. “–so long as I get SOMEWHERE,” Alice added as an explanation. “Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.” (Alice’s Adventures in Wonderland, Chapter 6) 33

Eurostat Where are we? Dramatic changes in the environment of official statistics producers (e.g. data deluge) Modernization of statistical information system seen as a question of survival for the sector of official statistics Standardization viewed as a key enabler for modernization Standards-based industrialization of statistical production 34

Eurostat SDMX 2020 Main challenges for the years to come: Strengthening implementation Facilitating data consumption Supporting statistical process innovation Enhancing communication Investing on training and capacity-building Action Plan SWG/TWG's work plan

Eurostat Thanks for your attention! 36 SDMX present and future « If you are not sure where you are going you will finish someplace else »