The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems

The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems
This paper has been amended from the January 2014 version by adding a scenario: SDMX web service returning RDF Chris Nelson Zurich November 2014 SDMX Technical Working Group

SDMX Technical Working Group
Presentation Scope Revised Document Content and Organisation Overview of RDF and the RDF Data Cube Vocabulary (mainly for newcomers to the TWG) Use Case Scenarios Recommendations for use with SDMX Zurich November 2014 SDMX Technical Working Group

Presentation Scope How RDF, the set of technology standards which underpin the Web of Linked Data, can provide a supplemental dissemination format for statistical agencies familiar with SDMX avoiding: Inconsistencies between parallel dissemination chains (e.g. in SDMX and in RDF format) from the same organization with different updates, or from different entities re-disseminating the same data; Too many manipulations of the data; Existence of a set of domain-specific, tailor-made software tools, which could be more efficiently replaced by one standardized and generic set of tools for working with SDMX-based RDF data and structures. This document describes some deployment scenarios for organizations which are disseminating SDMX-ML today, and discusses how to best expose their SDMX data and metadata as RDF using the Data Cube Vocabulary. Zurich November 2014 SDMX Technical Working Group

- presentation concentrates on this
Revised Document Document Organisation ( = added) Overview Characterization of the RDF Standards  The Intended Capabilities of the Data Cube Vocabulary  Existing Data Cube Implementations RDF and Other Standards (DDI, SDMX, GSIM)  The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems Additional Considerations for Publishing RDF Conclusions Annex 1 – RDF Issues Issue Remarks - presentation concentrates on this Zurich November 2014 SDMX Technical Working Group

Revised Document Feedback: Scenario B (SDMX Web Service masquerades as a SPARQL end point) was discarded in the version in June 2013 as it was not deemed to be the culture of the LOD community. As this scenario is being trialled by an NSI it is re-introduced The possibility to stream RDF data to a file or Triple Store using the SDMX Open Source is included in the relevant scenarios Zurich November 2014 SDMX Technical Working Group

What is LOD (linked open data)
In 2006 Tim Berners-Lee outlined four principles of Linked Data : Use URIs to identify things Use HTTP URIs so that these things can be referred to and looked up (dereferenced) by people and user agents Provide useful information about the thing when its URI is dereferenced, using standard formats such as RDF/XML Include links to other, related URIs in the exposed data to improve discovery of other related information on the Web Slide reproduced from METIS 2013 WP 10 Zurich November 2014 SDMX Technical Working Group

RDF Primer: What is SKOS “Simple Knowledge Organization System”
Provides a model for expressing the basic structure and content of concept schemes such as: thesauri, classification schemes, subject heading lists, taxonomies, folksonomies, and other similar types of controlled vocabulary. Slide reproduced from METIS 2013 WP 10 Zurich November 2014 SDMX Technical Working Group

RDF Primer: What is an RDF Triple
A standard knowledge model for making statements about resources based on a few elegant ideas: Everything has a URI. Knowledge is a series of Subject-Verb-Object statements (triples) Slide reproduced from METIS 2013 WP 10 Joe Smith Zurich November 2014 SDMX Technical Working Group

First: New Terms SKOS – simple knowledge organisation system XKOS – Extended to SKOS developed by FAO, INSEE, BLS, GESIS, and others to better support statistical classifications Used for concepts SKOVO – statistical core vocabulary has limitations so this is extended for Data Cube Vocabulary TURTLE – terse RDF triple language SPARQL - SPARQL protocol and RDF query language FOAF – for organisations Zurich November 2014 SDMX Technical Working Group 9

Data Cube Requirements/Design
DataCube uses a sub-set of the overall SDMX information model It is only concerned with discovery and processing of data on the Web As a result, it is a very simplified model Use of the COG is mandatory in DataCube If a concept exists, you are required to use it! The burden of mapping to the COG is assumed by the data producer/publisher Several existing RDF namespaces have been used (Dublin Core, FOAF, SKOS, SCOVO, etc.) This is best practice in the Linked Data world Zurich November 2014 SDMX Technical Working Group 10

The Data Cube Vocabulary

RDF Data Cube Vocabulary
structured by SDMX Data Set SDMX Data Structure Definition dimensionality RDF Data Cube Vocabulary

Data Cube: An Observation
DataCube is a W3C recommendation, and has gained some momentum The data producers using SDMX can also publish in the Data Cube Vocabulary (DCV) As with any other RDF publication the applications processing the RDF must understand the DCV data model to make sense of the data Therefore applications wishing to process any additional information added to the DCV triples need to understand the model of the attached data Zurich November 2014 SDMX Technical Working Group 13

The SDMX Perspective If you are using SDMX today (GESMES or XML), what does this mean? Most DataCube implementation today is being done by organizations that don’t use SDMX-ML For statistical organisations there is an increasing interest in RDF and there is a need to be able to integrate DataCube as an alternative query and delivery sourced originally from existing SDMX-based systems Zurich November 2014 SDMX Technical Working Group 14

SDMX to RDF Publishing Scenarios
 Publishing RDF triples as flat files to a Web Server  Populating a SPARQL endpoint with triples The enhancement of an SDMX-conformant RESTful web-service to support RDF SDMX Web Service masquerades as a SPARQL endpoint Or ?? SDMX Web Service outputs RDF: user system queries using SDMX web services Zurich November 2014 SDMX Technical Working Group

SDMX and RDF: Scenario 
RDF File Using SDMX Component Architecture Data Cube Writer SDMX Writer Interface JENA Statistical Dissemination System Either Or SDMX-ML to RDF Transformer SDMX-ML File

Scenario : Publish RDF triples as flat files
Publish to a server exposed to the web Packaged in a meaningful way using named graphs e.g. Data by data set Structures all in one file or codelists and concepts in one file and DSDs in another file Observations Needs to be kept up to date Either republish as a replace or as an incremental update Simple Approach But not easily queryable Discovery and linking tools typically work with SPARQL endpoints Zurich November 2014 SDMX Technical Working Group

SDMX and RDF: Scenario 
Using SDMX Component Architecture Triple Store (DataCube) Data Cube Writer Statistical Dissemination System Either SDMX Writer Interface JENA Or SDMX-ML File to RDF Transformer SDMX-ML File RDF Service SPARQL

Use Case: Populate a SPARQL endpoint
Deploy RDF triple in a “triple store” Dedicated database system that natively understands SPARQL queries Supported by many RDF tools Some support variety of flavours of RDF XML, TURTLE, N-Triples Data could be updated at level of the dataflow Observations Good support for linking (the reason for LOD) Good support for cross dataflow queries Data with some common dimensions Zurich November 2014 SDMX Technical Working Group

(Reintroduced) SDMX and RDF: Scenario 3
SDMX-ML SDMX-ML to RDF Transformer Or Statistical Dissemination System Either DataCube Writer SDMX REST Query SPARQL Web Service RDF Service

(Reintroduced) SDMX and RDF: Scenario 3
SDMX-ML SDMX-ML to RDF Transformer Or Statistical Dissemination System Either DataCube Writer SDMX REST Query

Use Case: Populate a SPARQL endpoint
Enable an SDMX system to act as a SPARQL endpoint database system that has a web service that transposes a SPARQL query to an SDMX query Observations Data are kept up to date Web service could extend support to many RDF tools Some support variety of flavours of RDF XML, TURTLE, N-Triples BUT No support for linking (major reason for LOD) No native SDMX support for cross dataflow queries Zurich November 2014 SDMX Technical Working Group

Considerations If RDF is treated as a completely separate syntax, then the burden of data management is doubled If it is treated as a delivery format (just another data writer), then it is relatively easy to implement Up-front cost for tools development Low ongoing maintenance The benefits of RDF-based technology are realized in a cost-effective manner Zurich November 2014 SDMX Technical Working Group 23

Note on Identifiers Retaining a mechanism to enable same object to be expressed in different syntaxes is deemed to be important for SDMX All identifiers in RDF are URIs and optimally these must be resolvable URLs Structure identifiers in SDMX are URNs and these can be transformed to URIs Data set can be identified using data set Id and timestamp (in ISO format) Zurich November 2014 SDMX Technical Working Group

Related Work One of the SDMX ESSNet projects has created a formal ontology out of the Common Metadata Vocabulary This work is really a first draft, but it is very promising It would combine well with DataCube to help people find the data they are looking for SKOS (Simple knowledge organization system) extensions (XSKOS) enable support for statistical classifications as part of DDI-RDF There is work to describe microdata in RDF, based on DDI Zurich November 2014 SDMX Technical Working Group 25

Recommendation for SDMX Systems
Publish data and structures (DSD, Concepts, Code Lists) as RDF Preferably publish in a Triple Store Keep update mechanisms in current systems Add new data versions or replace wholesale existing versions Linked data (3rd party links to the data) can still work if URLs do not change Base URLs on SDMX URN (for structures) and data set Id plus time stamp (for data) Zurich November 2014 SDMX Technical Working Group

The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems

Similar presentations

Presentation on theme: "The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems

Similar presentations

Presentation on theme: "The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems"— Presentation transcript:

Similar presentations

About project

Feedback