Www.sti-innsbruck.at © Copyright 2015 STI INNSBRUCK www.sti-innsbruck.at PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Creating Linked Data Juan F. Sequeda Semantic Technology Conference June 2011.
© Copyright 2008 STI INNSBRUCK The Google Knowledge Graph Ioan Toma.
(1) Standardizing for Open Data Ivan Herman, W3C Open Data Week Marseille, France, June Slides at:
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Highs and Lows of Library Linked Data Adrian Stevenson UKOLN, University of Bath, UK (until end Dec 2011) Mimas, Libraries and Archives Team, University.
Ontology and Web 3.0 Ism 158 May 13, 2010 Julian Chytrowski.
LINKED DATA COMS E6125 Prof. Gail Kaiser Presented By : Mandar Mohe ( msm2181 )
Behshid Behkamal Ferdowsi University of Mashhad Web Technology Lab.
Metadata : Concentrating on the data, not on the scheme Imma Subirats FAO of the United Nations Marcia Zeng Kent State University euroCRIS Meeting Bologna.
Data Sets, Vocabularies and Tools Pablo N. Mendes Freie Universität Berlin 1st year review Luxembourg, December /02/11.
KIT – University of the State of Baden-Wuerttemberg and National Laboratory of the Helmholtz Association Institute AIFB Linked Data and Services.
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
Advances in Technology and CRIS Nikos Houssos National Documentation Centre / National Hellenic Research Foundation, Greece euroCRIS Task Group Leader.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
RDA data and applications Gordon Dunsire Presented to staff of the British Library, Boston Spa, 20 Mar 2014.
© Copyright 2008 STI INNSBRUCK Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections.
Not Just For Data Geeks! A Practical Approach to Linked Data for Digital Library Managers Cory Lampert and Silvia Southwick Salt Lake City October 9, 2013.
© Copyright 2013 STI INNSBRUCK Linked Open Data Anna Fensel, Ioannis Stavrakantonakis,
2014-May-07. What is the problem? What have others done? What is our solution? Does it work? Outline 2.
© Copyright 2008 STI INNSBRUCK NLP Interchange Format José M. García.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
Towards a semantic web Philip Hider. This talk  The Semantic Web vision  Scenarios  Standards  Semantic Web & RDA.
D2.5 Proof-of-Concept Evaluation for Modelling Time and Space.
Using Semantic Mapping to Manage Heterogeneity in XLIFF Interoperability by Dave Lewis, Rob Brennan, Alan Meehan, Declan O’Sullivan CNGL Centre for Global.
© Copyright 2013 STI INNSBRUCK “How to put an annotation in HTML?” Ioannis Stavrakantonakis.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
Metadata : an overview XML and Educational Metadata, SBU, London, 10 July 2001 Pete Johnston UKOLN, University of Bath Bath, BA2 7AY UKOLN is supported.
Creating a Semantic Web with Linked Data Todd King.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
+ Karin Becker Instituto de Informática - Federal University of Rio Grande do Sul, Brazil Shiva Jahangiri, Craig A. Knoblock Information Sciences Institute,
Pete Johnston, Eduserv Foundation 16 April 2007 An Introduction to the DCMI Abstract Model JISC.
Linked Open Data Martin Nečaský Faculty of Mathematics and Physics, Charles University in Prague.
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Paloma Marín Arraiza 17 th International Conference on Grey Literature 1 st and 2 nd December 2015, Amsterdam (Netherlands) SCIENTIFIC AUDIOVISUAL MATERIALS.
Clinical research data interoperbility Shared names meeting, Boston, Bosse Andersson (AstraZeneca R&D Lund) Kerstin Forsberg (AstraZeneca R&D.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
CNI Spring 2016 Membership Meeting San Antonio TX Linked Data Implementations— Who, What and Why? Karen Smith-Yoshimura OCLC Research.
Linked Library (+AM) Data Presented LITA Next-Generation Catalog IG Corey A Harper Publish, Enrich, Relate and Un-Silo.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Semantic metadata in the Catalogue Frédéric Houbie.
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
® Hosted and Sponsored by W3C Provenance Working Group Update 80th OGC Technical Committee Austin, Texas (USA) Carl Reed March 20, 2012 Copyright © 2012.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
The Semantic Web By: Maulik Parikh.
Linked Data Web that can be processed by machines
Work plan revisited Activity 3 Impact Activity 4 Management
Jenn Riley Metadata Librarian Digital Library Program
Tools and guidance for data providers
Big Data Quality the next semantic challenge
NISO Virtual Conference 19 February 2014 Ralph Swick, W3C
Linking Guide Michel Böhms.
Big Data Quality the next semantic challenge
LOD reference architecture
W3C Recommendation 17 December 2013 徐江
Linked Data Reuse in the Language Services Industry
Jenn Riley Metadata Librarian Digital Library Program
Linked Data Ryan McAlister.
Profiles Guidance & Vocabulary
Big Data Quality the next semantic challenge
Australian and New Zealand Metadata Working Group
Classifications and Linked Open Data Formalizing the structure and content of statistical classifications Item 9.1 Standards Working Group Luxembourg,
Presentation transcript:

© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma

Attribution Some slides from –Steffen Stadtmueller, Tobias Käfer, Andreas Harth; PlanetData D2.7 Recommendations for contextual data publishing –Thank you! 2

Agenda Terminology and Motivation Overview Recommendations 3

What is data quality? PlanetData D2.1 deliverable Quality metrics: (1) accessibility, (2) interoperability and understandability, (3) timelines, (4) openness, (5) verifiability, (6) consistency, (7) completeness, (8) conciseness, (9) structuredness, (10) relevancy, (11) validity, (12) reputation The quality metrics also imply a set of best practices: –interlinking of datasets –provide provenance and licensing metadata –use of widely deployed vocabularies –use deferenceable URIs of proprietary vocabulary terms –mapping of proprietary vocabularies to others vocabularies –provide data set-level metadata –refer to additional access methods PlanetData D2.4 deliverable General adherence to these best practices the LOD Cloud 4

What is context? PlanetData D2.3 deliverable “information that can help data consumers to improve their services” 3 context dimensions: –Spatial - information with relation to (geo)spatial aspects of the described entities in the data (e.g., the shape of a product), or the geo spatial aspects of the data itself (e.g., the location, where a dataset was developed) –Temporal - information with regard to time, dates or time periods. –Social - information about people and the relation between people 5

Motivation Quality of data is an increasingly important issue in the Web of Data The presence of context information (temporal, spatial, social) can increase the quality of a dataset by itself What recommendations to be followed for contextual data publishing 6

Overview Crawled 1.5 billion triples from the LOD cloud (BTC2014 – Billion Triple Challenge 2014 crawl) Extraction of 3 data subsets from BTC2014 snapshot according to context dimensions, i.e. (1) Spatial, (2)Temporal and (3) Social Evaluation of data quality metrics (D2.1 + D2.4) for each of the datasets Comparison with results of D2.4, i.e., metrics for general LOD sample from the same time period Deduction of recommendations from the results 7

BTC 2014 Snapshot BTC Snapshot as of April 11, 2014 Crawl started February 20, 2014 Tool: adapted LDSpider Document: All triples from the same location (URI) PLD: Pay Level Domain, used to define the border of a dataset (host location) 8 CrawlMetric: number ofValue BTC 2014 SnapshotQuads1,533,623,743 Documents14,233,739 PLDs21,817

Contextual Data Subsets Approach: –Identify context indicating vocabulary terms –Manual cleanup –Extraction of all documents (and their triples) that contain an indicating term Indicating terms: –Temporal: Properties used in conjunction with “temporal-typed” objects Temporal properties from etc. –Spatial: Classes used to type spatial entities (+ closure over rdfs:subClassOf and owl:equivalentClass) Classes from etc –Social: Classes used to type social entities (+closure over rdfs:subClassOff and owl:equivalentClass) Classes from FOAF, schema.org, DBPedia, goodrealtions, etc. 9

Contextual Data Subsets 10

Recommendation The general recommendation is to follow the best practices as detailed in PlanetData D2.1 (see slide 5) Specific recommendations for providers of datasets with contextual information as guide on how to follow the best practices as follows: 11

Recommendation for Providing Links to Other Datasets A considerable part of the datasets are not interlinked Use owl:sameAs to establish identity links between contextual data items –owl:sameAs is the most used predicate to establish an outlink Promising datasets to find link targets (most used so far) –General: Dbpedia –Spatial: GeoNames –Social: DBLP (L3S), Semantic Web Dogfood –Temporal: no identity links for the entity directly possible; use Freebase and DBpedia for the annotated entity 12

Recommendation for Providing Provenance Data Use widely deployed vocabularies to express provenance information to ensure compatibility with existing tools and foster integration capabilities: –Dublin Core (specifically for temporal information) –MetaVocab (specifically for social information) –Cert Ontology For a thorough provisioning of provenance information we recommend W3C PROVenenance Interchange –Covers many aspects –Most used for general purposes 13

Recommendation for Providing License Data Most used predicates to indicate license information are –cc:license –dc:license Employ the most used predicates, as agents are more likely to search for them Use machine readable OKFN conformant or creative commons licenses –Already prevalent in contextual data subsets 14

Recommendation for Using Terms from Widely Deployed Vocabularies The use of well known vocabularies should always be preferred to allow agents to interpret the data without mappings to other vocabularies For temporal information Dublin Core is commonly employed (use in conjunction with ISO8601 encoded literals For spatial information WGS84 for points, NeoGeo for polygons For social information FOAF and CON If the well known vocabularies are insufficient use vocab.cc and LODstats to find vocabularies in the long tail 15

Recommendation for Dereferencability of Proprietary Vocabulary Terms Clear requirement from the Linked Data principles, which should always be adhered to when publishing linked data Provide RDF descriptions of vocabulary terms via HTTP, independently from the presence of context information Datasets with context information already follow this recommendation more often than the average on the LOD cloud 16

Recommendations for Mapping of Proprietary Vocabularies to Others Reuse of existing vocabularies should always be preferred over a proprietary vocabulary If proprietary vocabulary is necessary map the terms to well known and often used vocabularies to increase integration capabilities: –General purpose: Dublin Core, SKOS –Temporal: Dublin Core –Spatial: WGS84 –Social: FOAF Use subclass and subproperty relations to establish links Note: if equivalent class or equivalent property can be employed the direct use of the target term should be preferred! 17

Recommendation for Providing Dataset-level Metadata Only very few datasets make use of VoID information, so automated agents should not rely on the presence of such information Our analysis shows that the well-known URI mechanism is used more often However we recommend to (additionally) provide a backlink to the VoID file, so agents can soly rely on link traversal to identify all relevant information 18

Recommendation for Referring to Additional Access Methods The sensibility of providing an additional access method mostly depends on the size and structure of the dataset, not on the presence of context information. If an alternative access method is offered, we recommend to provide a machine-readable link to the access method in a VoID description. 19