Data Sets, Vocabularies and Tools Pablo N. Mendes Freie Universität Berlin 1st year review Luxembourg, December 2011 11/02/11.

Slides:



Advertisements
Similar presentations
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Advertisements

Project Overview Slide 2 of 15 Overview Project in a Nutshell ◦Motivation ◦Aims and Objectives ◦Expected Outcomes PlanetData Programs Join PlanetData.
Dissemination and community building Simeona Pellkvist Lyndon Nixon 1st year review Luxembourg, December st year review Luxembourg, December 2011.
So What is a Learning Object? nmc. Autodesk Content Hierarchy.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
(1) Standardizing for Open Data Ivan Herman, W3C Open Data Week Marseille, France, June Slides at:
Converting Metadata to Linked Data Hydra Connect October 2, 2014 Karen Estlund, Head, Digital Scholarship Center Director, Oregon Digital Newspaper Program.
VIVO and Linked Open Data December 13, 2010 Dean B. Krafft Chief Technology Strategist and Director of IT Cornell University Library.
Doug Nebert, Senior Advisor for Geospatial Technology, System-of-Systems Architect FGDC Secretariat.
Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe A.Gómez-Pérez (UPM) Project Coordinator.
Doug Nebert Senior Advisor for Geospatial Technology CSS, FGDC Secretariat.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
National libraries and identity in the Semantic Web Gordon Dunsire BNE, Madrid, 14 Dec 2011.
The NSDL Registry Jon Phipps Stuart Sutton Diane Hillmann Ryan Laundry Cornell U. U. of Washington.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
1/ 27 The Agriculture Ontology Service Initiative APAN Conference 20 July 2006 Singapore.
Good practice in Research Data Management Module 6: Tools, training and support.
© Copyright 2012 STI INNSBRUCK
Network of Excellence in Internet Science Network of Excellence in Internet Science (EINS) 2 nd REVIEW Brussels, 4-5 February 2014 FP7-ICT
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
Using the SAS® Information Delivery Portal
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
A Perspective on Preservation of Linked Data Richard Cyganiak DERI, NUI Galway.
© Copyright 2013 STI INNSBRUCK Linked Open Data Anna Fensel, Ioannis Stavrakantonakis,
Providing Access to Your Data: Access Mechanisms Robert R. Downs, PhD NASA Socioeconomic Data and Applications Center (SEDAC) Center for International.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
Educational Area Crete, May 13th, 2004.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
WGISS-40: IDN Report Michael Morahan WGISS-40 Fall meeting / Harwell, United Kingdom
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Semantic Publishing Benchmark Task Force Fourth TUC Meeting, Amsterdam, 03 April 2014.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
+ Karin Becker Instituto de Informática - Federal University of Rio Grande do Sul, Brazil Shiva Jahangiri, Craig A. Knoblock Information Sciences Institute,
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Paloma Marín Arraiza 17 th International Conference on Grey Literature 1 st and 2 nd December 2015, Amsterdam (Netherlands) SCIENTIFIC AUDIOVISUAL MATERIALS.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
© Copyright 2015 STI INNSBRUCK PlanetData D2.7 Recommendations for contextual data publishing Ioan Toma.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
WP3: Data Provenance and Access Control Irini Fundulaki, FORTH December 11-12, 2012, Luxembourg.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
The TERENA-OER Portal Eli Shmueli IUCC- Israeli-Inter Universities Communication Center MEITAL- Inter-University Center for e-Learning
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Linked Data Profiling Andrejs Abele UNLP PhD Day Supervisor: Paul Buitelaar.
Shared innovation Linking Distributed Data across the Web Dr Tom Heath Researcher, Platform Division Talis Information Ltd t
Working meeting of WP4 Task WP4.1
RDFa How and Why Ralph R. Swick World Wide Web Consortium
Cloud based linked data platform for Structural Engineering Experiment
BIBFLOW Project Update
Work plan revisited Activity 3 Impact Activity 4 Management
Steering Group Member, Link Digital
Flanders Marine Institute (VLIZ)
Lifting Data Portals to the Web of Data
Applications of IFLA Namespaces
B2FIND Integration and Usage
ESSnet Linked Open Statistics Update
EU Law and Publications Access and reuse the content
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Cataloging the Internet
CESSDA Workplan: Metadata Harvesting Tool
LOSD Publication Deirdre Lee
LOD reference architecture
Agro Hackathon Hack 5: Agro Portal and VEST Registry
ESSnet Linked Open Data WP1 Prototype
W3C Recommendation 17 December 2013 徐江
The GEOSTAT project Work Package 2 Geostatistics Working Party/Meeting
Linked Data Reuse in the Language Services Industry
Linked Data Ryan McAlister.
Pilot use of Linked Open Data technologies for publishing official statistics: current status in the ESS and Eurostat April 17th, 2018 GISCO WG.
Presentation transcript:

Data Sets, Vocabularies and Tools Pablo N. Mendes Freie Universität Berlin 1st year review Luxembourg, December /02/11

FUB 4248 D4.1 Assembly and maintenance of the PlanetData data set catalogue D4.2 Best practices on how to provide self-describing data D4.2 Best practices on how to provide self-describing data KIT KIT Work Plan View WP4 UPM D4.3 PlanetData data sets, vocabularies and provisioning tools catalogue and access portal D4.4 Data quality benchmark dataset D4.5 PlanetData data sets, vocabularies and provisioning tools catalogue and access portal Task 4.4 Assembly and maintenance of a catalogue of data provisioning tools Task 4.3 Development of best practices for providing self- describing data Task 4.2 Community-driven creation and maintenance of vocabularies Task 4.1 Assembly and maintenance of the PlanetData data set catalogue

Task 5.1 Assembly and maintenance of PlanetData technology catalogue Task 5.2 Development of best practices of large-scale data management infrastructures D5.3 PlanetData data management tools catalogue and access portal D5.3 PlanetData data management tools catalogue and access portal EPFL 4248 D5.1PlanetData data management tools catalogue and access portal D5.1PlanetData data management tools catalogue and access portal D5.2 Best practices on how to deploy tools on large-scale infrastructures KIT Work Plan View WP5

Summary WP4 Assembly and maintenance of the PlanetData data set, vocabularies and tools catalogue; Community-driven creation and maintenance of vocabularies; Development of best practices; WP5 Assembly and maintenance of the PlanetData technology catalogue; Best practices for large-scale data management infrastructure;

Tasks Task 4.1 – Assembly and maintenance of the PlanetData data set catalogue (Leader: FUB) (M1 – M48) Task 4.2 – Community-driven creation and maintenance of vocabularies (Leader: KIT)(M1 – M48) Task 4.3 – Development of best practices for providing self-describing data (Leader: KIT) (M1 – M24) Task 4.4 – Assembly and maintenance of a catalogue of data provisioning tools (Leader: UPM) (M1 – M48) 11/02/11

Deliverables in Year 1 D 4.1 Data Sets Catalog Vocabularies Catalog D 5.1 Data Management Tools Catalog

Data Sets Catalog Where to maintain the catalog? How to catalog? What to catalog? How to provide access for humans and machines? How to organize a community around the catalog?

Repository: TheDataHub.org Maintained by Open Knowledge Foundation (OKF) and world-wide open data community Widely used catalog Dec 1st 2012: has 2418 datasets, 314 LOD Features of the portal: Tagging, Rating, Feedback, Discussions, Groups

Cataloguing Process Planet Data Editor Collected a list of new datasets → 49 new entries Updated existing entries (537 edits) Crowdsourcing: data providers and third parties Public call for action to mailing lists, OKFN blog Supported the community contributions Quality Assurance Tools to support cataloguing (validator, auto-complete) Joint work with LATC

Catalog Metadata QuickRef What? package name, title, url tag:lod topic shortname format-* Who? author || maintainer published by producer provenance metadata license When? version last updated Why? package description Where to find? example URI downloads/dumps SPARQL endpoint How much? triples links:* (outlinks) namespace (inlinks) vocab mappings

How are datasets described? Catalog Metadata Resources: example URIs SPARQL endpoint RDF Dumps Sitemaps, VoID files

Cataloguing process overview

Catalog Entry Validator Checks levels of metadata completeness Step-by-step annotation instructions Already checks some quality indicators e.g. availability, provenance, access methods

CKAN Entry Validator (2)

Auto-completion scripts For the entries that pass the validator, we can auto-complete metadata with information such as: Number of triples Links to other sources Vocabularies used Quality indicators

Catalog Access Portal For machines CKAN API (continuously improved by OKFN) VOID descriptions for LOD group (will be continuously improved in cooperation with LATC) For humans LOD Cloud Diagram State of the LOD Report

Access for machines

LOD Cloud Diagram

LOD Cloud Diagram (zoom in)

State of the LOD Cloud Triples by domainLinks by domain Domain # of datasets Triples%(Out-)Links% Media251,841,852, %50,440, % Geographic316,145,532, %35,812, % Government4913,315,009, %19,343, % Publications872,950,720, %139,925, % Cross-domain414,184,635, %63,183, % Life sciences413,036,336, %191,844, % User-generated content 20134,127, %3,449, % 29531,634,213,770503,998,829

State of the LOD Cloud (2) SPARQL Endpoint: 68.14% RDF Dumps: 39.66% Provide provenance: % Provide licensing: 17.84% vocabulary use:

Vocabularies Catalog Based on BTC Dataset (2.1 billion triples) Shows vocabulary usage in practice Executed on a 54 node Hadoop cluster Access portal: Searchable URI Lookup Top usage statistics Hosted at

Top Classes per Dataset

Top Properties per Dataset

Vocabularies Catalog vocab.cc search query results vocab.cc URI Lookup Results

Tools Catalog Initial focus on tools from the consortium Currently 15 tools Entry for Global Sensor Networks (GSN) Available from planet-data.eu

Tools Description Textual description What is it? Documentation Publications Requirements License Contact person/mailing list Organization Events Tags Produce Publish Consume Provisioning

Names of Tools in the Catalog CumulusRDF D2R DBpedia Spotlight GSN (Global Sensor Networks) Geometry2RDF LDIF LDSpider (Linked Data Spider) LarKC (Large Knowledge Collider) MonetDB NOR2O R2O&ODEMapster OKKAM Pubby R2R S2O Silk

Tools Catalog Related: LATC Tools Catalog 11 tools 5 tools in both, 10 new tools in PlanetData Proposal for next year: Join catalogs at linkeddata.org Jointly maintain catalog until LATC finishes Build a community → people can add their own tools Afterwards PlanetData takes over and maintains the catalog for another 2 years