Google Dataset Search Evaluation

Slides:



Advertisements
Similar presentations
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
Advertisements

OneGeology-Europe - the first step to the European Geological SDI INSPIRE Conference 2010, Session Thematic Communities: Geology Krakow, June 24 th 2010.
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
Project Report Presentation and Update October 10, 2014 Jeff Mixter - OCLC Research Patrick OBrien - Montana State Univeristy Kenning Arlitsch - Montana.
Metadata (for the data users downstream) RFC GIS Workshop July 2007 NOAA/NESDIS/NGDC Documentation.
ESA UNCLASSIFIED – For Official Use Data Stewardship Interest Group WGISS-39 Meeting Data Purge Alert Procedure Tsukuba, Japan – May, 2015 Mirko.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
SCADM MEETING, SEPTEMBER 7, 2011 STATUS OF THE ANTARCTIC MASTER DIRECTORY.
GCMD/IDN STATUS AND PLANS Stephen Wharton CWIC Meeting February19, 2015.
Challenges of Discovery Tools Challenges of Discovery Tools Shelly Shen-Aridor Younes & Soraya Nazarian Library Haifa university, Israel Session
First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer.
FGDC and GOS Metadata: Foundations to Build the NSDI Sharon Shin FGDC Secretariat / Geospatial One-Stop.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
Marshall Breeding Director for Innovative Technology and Research Vanderbilt University
VO Sandpit, November 2009 CEDA Metadata Steve Donegan/Sam Pepler.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Using the Global Change Master Directory (GCMD) to Promote and Discover ESIP Data, Services, and Climate Visualizations Presented by GCMD Staff January.
WGISS-40: IDN Report Michael Morahan WGISS-40 Fall meeting / Harwell, United Kingdom
Metadata – use data discovery e.g. a library catalog data assessment determine the fitness-for-purpose of a data set data retrieval e.g., format.
Geospatial One-Stop FGDC and GOS: Working as One to Build the NSDI Sharon Shin Federal Geographic Data Committee Geospatial One-Stop Metadata Coordinator.
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
Weaving Data into the Scholarly Information Network UNECE Work Session on the Communication of Statistics OECD Conference Centre, Paris June 30 - July.
1 Do More Searching in Less Time Winter Term 2013 Helen B. Josephine
Find Research Data b2find.eudat.eu B2FIND User Training How to find data objects and collections using EUDAT’s B2FIND This work is licensed.
Metadata Training for SEFSC Science Staff Part Two.
The Proliferation of Metadata Standards and the Evolution of NASA’s Global Change Master Directory (GCMD) Standard for Uses in Earth Science Data Discovery.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
Advertising your data Alecia Aleman 1, Ruth Duerr 2 1 National Aeronautics and Space Administration (NASA) 2 National Snow and Ice Data Center, University.
DLF Fall Forum The Distributed Library: OAI for Digital Library Aggregation UIUC’s Role: Registry of OAI Data Providers
GBIF Governing Board 20 Module 6B: New GBIF Tools II 2013 Portal and NPT Startup Daniel Amariles IT Leader, National Biodiversity Information System of.
CEOS Working Group on Information System and Services (WGISS) Data Access Infrastructure and Interoperability Standards Andrew Mitchell - NASA Goddard.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
GCI Architecture GEOSS Information System Meeting 20 September 2013, ESA/ESRIN (Frascati, Italy) M.Albani (ESA), D.Nebert (USGS/FGDC), S.Nativi (CNR)
GeoNetwork OpenSource: Geographic data sharing for everyone
Activities in a nutshell
Discovering and accessing data from a distributed network of data centres S. Mazzeo (ESA)
Stanford University, Stanford, CA, USA
WGISS Connected Data Assets
High Level Architecture
Document, Index, Discover, Access
An Open Knowledge & Research Information Infrastructure
Flanders Marine Institute (VLIZ)
Mapping the Network Landscape Ivette Serral
Common Framework for Earth Observation Data
A Lightweight Structured Data Implementation Using JSON-LD and Schema
Tools and guidance for data providers
Lifting Data Portals to the Web of Data
WGISS-41: IDN Report Michael Morahan CEOS WGISS-41 Meeting
Metadata Quality: Learning from Open Data Portalwatch
WGISS-WGCV Joint Session
WGISS Connected Data Assets April 9, 2018 Yonsook Enloe
EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal
Enhancing ICPSR metadata with DDI-Lifecycle
European Open Science Cloud All Hands Meeting Pisa 8-9 March 2018
Making data discoverable through mainstream search engines
Geraldine Nolf, Dirk De Baere, Mathias De Schrijver, Bart Cosyn
Session 2: Metadata and Catalogues
FDA-08 FDA Whitepaper Update
CEOS OpenSearch Conformance Test Document
WGISS Connected Data Assets Oct 24, 2018 Yonsook Enloe
LOD reference architecture
Datasets in CRM Site Proposal
Dataset Search 王夏霞.
W3C Recommendation 17 December 2013 徐江
IPET-DD-1 meeting Feb 2019 Thorsten Busselberg -DWD
Data Discovery Paradigms Interest Group 4 April, 2019 RDA 13th Plenary Meeting, Philadelphia Siri Jodha Singh Khalsa Fotis Psomopoulos Mingfang Wu.
Australian and New Zealand Metadata Working Group
OGC Happenings: OGC19-020: Testbed-15 Service Discovery
WGISS WGISS Connected Data Assets Status Report October, 2019 CWIC Team Eugene Yu (GMU), Archie Warnock (A/WWW), Li Lin (GMU)
Federated Earth Observation (FedEO)
Presentation transcript:

Google Dataset Search Evaluation CEOS WGISS-46, October 23, 2018 André Twele, Christian Strobl and Katrin Molch

Google Dataset Search – some quick facts Launched on 5 September 2018 Think of a "Google Scholar for data” Main aim: Facilitating the discoverability of datasets from thousands of repositories across the web Initial release mainly covers the environmental and social sciences, government data, and data from news organizations Relies on dataset providers to embed structured (meta-)data into their web sites using schema.org dataset or equivalent structures (W3C DCAT) for markup Formats: JSON-LD, RDFa 1.1 or Microdata syntax https://toolbox.google.com/datasetsearch

Evaluation – first impressions Quality of search results differs, probably as a result of the different portals where Google’s Search Engine retrieved it: For many search results, only a “Description”-field (equiv. to gmd:abstract in ISO) is shown

Evaluation – first impressions Only included for some search results: “Dataset published, created or updated”, “Dataset provided by”, “Time period covered”, etc. Rarely information on spatial properties, geographical coverage, etc. Dataset Search can detect if a dataset is present in more than one repository

Evaluation – first impressions

Evaluation – DLR EOC Catalogue DLR EOC Catalogue currently contains 184 entries (ISO 19115/19139) From a snapshot of 20 catalogue entries, 18 entries were discoverable through Google Dataset Search However, DLR currently does not provide metadata which can be directly processed by Google’s Search Engine ( schema.org/DCAT) …so how did DLR’s catalogue content make it to Google Dataset Search? www.europeandataportal.eu www.geoportal.rlp.de www.geoportal.hessen.de geo.spacebel.be …

Evaluation – Example “path” of metadata from its origin Indexing of schema.org / JSON-LD markup Harvesting of DCAT-AP records Data Cleaning, Replica Identification, Scholar Linking, Knowledge Graph Reconciliation Harvesting of ISO 19115/19139 records (Enriched) Metadata Index Ranked Results Catalogue

Testing tool for validating URLs/Code-snippets for structured data Test URL: https://www.europeandataportal.eu/data/en/dataset/f4d4079a-ada3-41d0-ba95-630ba232e147 („SRTM X-SAR - Digital Elevation Model (DEM) Tiles - Global“)

Evaluation – Other European Catalogues Selected ESA and EUMETSAT collections have all been retrievable through Google Dataset Search Most frequent metadata sources: geo.spacebel.be, fedeo.esa.int, cmr.earthdata.nasa.gov, data.nasa.gov, www.europeandataportal.eu Ranking of search results sometimes questionable: e.g. single value-added datasets are ranked higher than the original collection/dataset series Level of detail for individual search results is quite heterogeneous

Conclusions Completeness and quality of search results strongly depends on structured metadata added by dataset providers on their sites Adoption of open standards for describing structured data (schema.org, DCAT, JSON-LD) by Google will further encourage their usage and boost their adoption Availability of metadata replicas in several catalogues indexed by Google makes it sometimes hard to retrieve the original dataset from its portal Tendency that metadata properties/attributes get „lost in translation“ as a result of different schema transformations or portals through which they are offered

Links / References https://toolbox.google.com/datasetsearch (Dataset Search Portal) https://search.google.com/structured-data/testing-tool (Testing tool) https://developers.google.com/search/docs/data-types/dataset (Google developer guide on dataset discovery) https://productforums.google.com/forum/#!topic/webmasters/nPq4BW6iPIA (Google FAQ on structured data https://schema.org/Dataset (schema.org dataset markup) https://www.w3.org/TR/vocab-dcat/ Data Catalogue Vocabulary (DCAT) format

Thank you very much for your attention! André Twele DLR Earth Observation Center Andre.Twele@dlr.de