Presentation is loading. Please wait.

Presentation is loading. Please wait.

Google Dataset Search Evaluation

Similar presentations


Presentation on theme: "Google Dataset Search Evaluation"— Presentation transcript:

1 Google Dataset Search Evaluation
CEOS WGISS-46, October 23, 2018 André Twele, Christian Strobl and Katrin Molch

2 Google Dataset Search – some quick facts
Launched on 5 September 2018 Think of a "Google Scholar for data” Main aim: Facilitating the discoverability of datasets from thousands of repositories across the web Initial release mainly covers the environmental and social sciences, government data, and data from news organizations Relies on dataset providers to embed structured (meta-)data into their web sites using schema.org dataset or equivalent structures (W3C DCAT) for markup Formats: JSON-LD, RDFa 1.1 or Microdata syntax

3 Evaluation – first impressions
Quality of search results differs, probably as a result of the different portals where Google’s Search Engine retrieved it: For many search results, only a “Description”-field (equiv. to gmd:abstract in ISO) is shown

4 Evaluation – first impressions
Only included for some search results: “Dataset published, created or updated”, “Dataset provided by”, “Time period covered”, etc. Rarely information on spatial properties, geographical coverage, etc. Dataset Search can detect if a dataset is present in more than one repository

5 Evaluation – first impressions

6 Evaluation – DLR EOC Catalogue
DLR EOC Catalogue currently contains 184 entries (ISO 19115/19139) From a snapshot of 20 catalogue entries, 18 entries were discoverable through Google Dataset Search However, DLR currently does not provide metadata which can be directly processed by Google’s Search Engine ( schema.org/DCAT) …so how did DLR’s catalogue content make it to Google Dataset Search? geo.spacebel.be

7 Evaluation – Example “path” of metadata from its origin
Indexing of schema.org / JSON-LD markup Harvesting of DCAT-AP records Data Cleaning, Replica Identification, Scholar Linking, Knowledge Graph Reconciliation Harvesting of ISO 19115/19139 records (Enriched) Metadata Index Ranked Results Catalogue

8 Testing tool for validating URLs/Code-snippets for structured data
Test URL: („SRTM X-SAR - Digital Elevation Model (DEM) Tiles - Global“)

9 Evaluation – Other European Catalogues
Selected ESA and EUMETSAT collections have all been retrievable through Google Dataset Search Most frequent metadata sources: geo.spacebel.be, fedeo.esa.int, cmr.earthdata.nasa.gov, data.nasa.gov, Ranking of search results sometimes questionable: e.g. single value-added datasets are ranked higher than the original collection/dataset series Level of detail for individual search results is quite heterogeneous

10 Conclusions Completeness and quality of search results strongly depends on structured metadata added by dataset providers on their sites Adoption of open standards for describing structured data (schema.org, DCAT, JSON-LD) by Google will further encourage their usage and boost their adoption Availability of metadata replicas in several catalogues indexed by Google makes it sometimes hard to retrieve the original dataset from its portal Tendency that metadata properties/attributes get „lost in translation“ as a result of different schema transformations or portals through which they are offered

11 Links / References (Dataset Search Portal) (Testing tool) (Google developer guide on dataset discovery) (Google FAQ on structured data (schema.org dataset markup) Data Catalogue Vocabulary (DCAT) format

12 Thank you very much for your attention!
André Twele DLR Earth Observation Center


Download ppt "Google Dataset Search Evaluation"

Similar presentations


Ads by Google