SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION

Slides:



Advertisements
Similar presentations
Requirements. UC&R: Phase Compliance model –RIF must define a compliance model that will identify required/optional features Default.
Advertisements

CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Semantic Web Introduction
 Copyright 2004 Digital Enterprise Research Institute. All rights reserved. SPARQL Query Language for RDF presented by Cristina Feier.
Chapter 3 Querying RDF stores with SPARQL. TL;DR We will want to query large RDF datasets, e.g. LOD SPARQL is the SQL of RDF SPARQL is a language to query.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Using Vocabulary Services in Validation of Water Data May 2010 Simon Cox, JRC Jonathan Yu & David Ratcliffe, CSIRO.
Entity Recognition via Querying DBpedia ElShaimaa Ali.
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
Chapter 3 Querying RDF stores with SPARQL. Why an RDF Query Language? Why not use an XML query language? XML at a lower level of abstraction than RDF.
Ontology Query. What is an Ontology Ontologies resemble faceted taxonomies but use richer semantic relationships among terms and attributes, as well as.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
Semantic Enhancement: Key to Massive and Heterogeneous Data Pools Violeta Damjanovic, Thomas Kurz, Rupert Westenthaler, Wernher Behrendt, Andreas Gruber,
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
OWLED 2008 DC Use of OWL and SWRL for Semantic Relational Database Translation Matthew Fisher, Mike Dean, Greg Joiner {mdean, April 1.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back Sérgio Bacelar
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Eurostat SDMX and Global Standardisation Marco Pellegrino Eurostat, Statistical Office of the European Union Bangkok,
ELIS – Multimedia Lab PREMIS OWL Sam Coppens Multimedia Lab Department of Electronics and Information Systems Faculty of Engineering Ghent University.
Chapter 3 Querying RDF stores with SPARQL
Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,
THE SEMANTIC WEB By Conrad Williams. Contents  What is the Semantic Web?  Technologies  XML  RDF  OWL  Implementations  Social Networking  Scholarly.
Semantic Data Extraction for B2B Integration Syntactic-to-Semantic Middleware Bruno Silva 1, Jorge Cardoso 2 1 2
Using DSDL plus annotations for Netconf (+) data modeling Rohan Mahy draft-mahy-canmod-dsdl-01.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
Session: Towards systematically curating and integrating
Components.
The Semantic Web By: Maulik Parikh.
Knowledge Representation and Reasoning in IKS
Introduction to Persistent Identifiers
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
Highlighting the added value of Statistical Linked Open Data
Middleware independent Information Service
SDMX Information Model
Analyzing and Securing Social Networks
The Re3gistry software and the INSPIRE Registry
Linked Data for SDG Reporting
Eurostat activities update
SDMX: A brief introduction
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
11. The future of SDMX Introducing the SDMX Roadmap 2020
SDMX Reference Infrastructure Introduction
Triple Stores.
PREMIS Tools and Services
2. An overview of SDMX (What is SDMX? Part I)
The Data Cube Vocabulary: Deploying SDMX as RDF from Existing Systems
2. An overview of SDMX (What is SDMX? Part I)
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
Information and software architecture for statistical dissemination
LOD reference architecture
ESS VIP ICT Project Task Force Meeting 5-6 March 2013.
Item 7.3 (b) SDMX for UOE data collection
Semantic-Web, Triple-Strores, and SPARQL
COmmon REference Environment - CORE:
Rome Hackathon results March 2019
GISCO Working Party Mirosław Migacz Chief GIS Specialist
SDMX IT Tools SDMX Registry
SDMX IT building blocks
Presentation transcript:

SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION WORKING GROUP 3rdMEETING 13-14 MAY 2013 ITEM 2.4 Integrating Statistical Data with Semantic Web

From SDMX to RDF Data Cube Vocabulary: Integrating Statistical Data with Semantic Web Monica Scannapieco Italian National Institute of Statistics (Istat) Joint work with: Raffaella M. Aracri, Andrea Pagano, Laura Tosco, Luca Valentino Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg

Open Data & Linked Open Data Format that allows «usage, reuse and redistribution» Linked Open Standard models and formats intended for data integration on the Web Current situation: Overcoming pattern «formerly open - then linked» Data straightly expressed as Linked Open Data (LOD) Example 1 - US data.gov: PA open data portal in the PA. In progress migration to LOD Example 2: DBpedia & Wikidata DBpedia: data extraction from Wikipedia infoboxes Wikidata: structured database, aiming to feed Wikipedia infoboxes 1 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 3

Linked Open Data and Semantic Interoperability Semantic Web Stack Knowledge Semantic Format and syntax Linked Open Data Data represented by means of RDF (Resource Description Framework) languages Interconnected => Semantic Interoperability 2 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 4

Background: RDF, RDFS, OWL Resource Description Framework (RDF) Language for representing information about resources in the World Wide Web W3C Recommendation 10 February 2004 RDF Schema (RDFS) Language for RDF vocabulary sharing 10 February 2004 (as part of a wider revision of RDF) RDFS became a W3C Recommendation OWL Language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF and is derived from the DAML+OIL Web Ontology Language OWL W3C Recommendation 10 February 2004 OWL 2 W3C Recommendation 11 December 2012 OWL 2 adds new functionality with respect to OWL 1. Some of the new features are syntactic sugar (e.g., disjoint union of classes) while others offer new expressivity, including: keys; property chains; richer datatypes, data ranges; qualified cardinality restrictions; asymmetric, reflexive, and disjoint properties; and enhanced annotation capabilities 3 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 5

Background: SPARQL and RDF Data Cube Vocabulary SPARQL (Sparql Protocol And RDF Query Language) is a language with a syntax similar to SQL for querying RDF data and a communication protocol based on HTTP A SPARQL client can query a SPARQL endpoint with queries on a RDF graph SPARQL allows “graph pattern matching” on RDF data W3C Recommendation 15 January 2008 RDF Data Cube (RDF QB) is a W3C Working Draft of the 12 March 2013 RDF QB is based on SDMX Focused only on the publication on the web of multi-dimensional data Built on the SDMX information model 4 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 6

RDF Basics Resource Description Framework (RDF) Allows to represent data/metadata trough assertions, called triples A triple: <subject> <property> <object> A resource is uniquely identified by a URI A resource @<property> puts in relationship other two resources @<subject> and @<object> A property can also put in relationship a resource and a «literal», i.e., a pure symbolic expression, e.g., a number, a string In such a way we have the RDF graph Example: <http://spcdata.digitpa.gov.it/Organization/Istat> <http://www.geonames.org/ontology#locatedIn> <http://spcdata.digitpa.gov.it/Municipality/H501> http://spcdata.digitpa.gov.it/Municipality/H501 http://spcdata.digitpa.gov.it/Organization/Istat 5 http://www.geonames.org/ontology#locatedIn Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 7

How can we "integrate" the Istat data with data Problem ISTAT Dati sul WEB Reference Metadata Enhanced SDMX BB1 BB2 BBn SDMX Web Service Provider SEP Tablets/ smartphones Structural Metadata WEB GUI … Istat Information System Building Blocks Excel Plug -in Metadata Management System How can we "integrate" the Istat data with data on the Web already interconnected with each other? 6 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg

Translator Solution 7 SDMX <structure:Concept id="REF_AREA"> <structure:Name xml:lang="en">Geographical reference area</structure:Name> </structure:Concept> 7 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg

Transformation from SDMX to RDF-QB Code list Concept dimension, attribute, measure ? DSD SDMX RDF QB Data file SDMX Data set RDF transformation 8 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 10

Analysis of the Technological Environments R Environment: Package RSDMX: not complete and not actively maintained Java Environment: Apache JENA: framework for reading, processing and writing data in RDF, SPARQL queries are processable and more Input SDMX not covered MIMAS Project (http://mimasld.wordpress.com/2011/11/25/final-product-post-esds-the-sdmx-to-rdf-process/ ) Transformation of data using XSLT 9 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 11

Technological Choice and Design Usage of XSLT transformations Execution engine of transformation: Saxon (Home Edition) Supports XSLT 2.0, XQuery 1.0, XPath 2.0 Available in both Java and .NET (Principal) Differences with MIMAS Level of generalization: our translator is generalized, while MIMAS provides transformations ad-hoc to datasets Generation of separate files for SDMX Data, SDMX DSDs, and Codelists 10 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 12

Example of Mapping SDMX DSD RDF QB 11 <structure:Concept id="REF_AREA"> <structure:Name xml:lang="en">Geographical reference area</structure:Name> </structure:Concept> RDF QB <rdf:Description rdf:nodeID="REF_AREA">       <qb:dimension rdf:resource="http://dati.istat.it/output/code/territory"/>       <dc:language>en</dc:language>       <rdf:type rdf:resource="http://purl.org/linked-data/sdmx#DimensionProperty"/>       <rdf:type rdf:resource="http://purl.org/linked-data/sdmx#CodedProperty"/>       <sdmx:codeList rdf:resource="http://dati.istat.it/output/code/territory"/>       <rdfs:range rdf:resource="http://dati.istat.it/output/code/territory"/>       <rdfs:label xml:lang="en">Geographical reference area</rdfs:label>    </rdf:Description> 11 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 13

Example of XSL File: Definition of Transformation Rules Reading from SDMX: Concept and Codelist <xsl:for-each select="doc($dsd)//*[local-name()='KeyFamilies']/structure:KeyFamily/structure:Components/*[@conceptRef]"> <xsl:variable name="conceptRef" select="./@conceptRef"/> <xsl:variable name="codeList" select="./@codelist"/> <xsl:variable name="codeListName" select="doc($dsd)/*[1]/*[2]/*[name()='registry:CodeLists']/*[@id=$codeList]/structure:Name[@xml:lang=$language]"/> <xsl:element name="rdf:Description" > <xsl:attribute name="rdf:nodeID" select="$conceptRef"/> <xsl:element name="qb:dimension"> <xsl:attribute name="rdf:resource" select="concat($IstatRoot,'/code/',$codeListName)"/> </xsl:element> Writing in RDF QB: Concept and Codelist 12 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 14

Syntactic and Semantic Validation Syntactic Validation: syntactic validation of RDF files creating RDF triples and their graph representation Used the free software validator (http://www.w3.org/RDF/Validator/) Verified also compliance with turtle (Eurostat format for DSD) from RDF XML to turtle format (.ttl) Software any23 (http://any23.apache.org/ ) Semantic Validation: data model is a «valid» RDF QB model Used Openlink Virtuoso (http://download.openlinksw.com/ ) 13 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 15

Semantic Validation SELECT obs_id, ref_area, obs_value, time_period, territoryLabel WHERE typeofWaste=9 AND Time=2001-01-01 14 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 16

Test Case and Performance Data Input Size (KB) Execution Time (min,sec) Base 1.338 6m 7,9s 2.674 12m 32,86s 4.009 18m 42,248s 5.345 23m 49,127s Optimized_1 2m 17,71s 4m 36,933s 7m 28,945s 9m 12,627s Optimized_2 5,6s 7,87s 11,003s 13,968s Optimized 1: mimimize DSD access Optimized 2: in memory access of DSD representation 15 17

Test Case e Performance Milliseconds (Log Scale) KB 16 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 18

Prototype development SDMX data format: Compact Conclusions and Future Developments Prototype development SDMX data format: Compact Possible extensions to other SDMX formats (i.e., Generic, Cross-Sectional) Extension of the transformation rules of the constructs Integration with the Istat Single Exit Point (SEP) 17 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 19

Complexity Query Data Model Discussion: RDF Publication Architectures (+) Linking with other sources (-) Redundancy wrt SDMX (-) Deployment dedicated SPARQL EndPoint with triple (+) Easy to be published (-) Not queryable High Complexity Query RDF in flat file RDF via SDMX Low (+) Recovery of SDMX investment (SEP) (-) Cross-format translation during the query phase (-) Linking with other sources Low High Data Model 18 Monica Scannapieco , 3rd SISAI Meeting, 13-14 May 2013, Luxembourg 20