Presentation is loading. Please wait.

Presentation is loading. Please wait.

First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer.

Similar presentations


Presentation on theme: "First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer."— Presentation transcript:

1 First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer Polytechnic Institute Troy, New York, USA Twitter: @olyerickson #TWCRPI

2 Open Government Data Around the World 2 Starting with efforts in the US and UK, governments around the world have recognized the need to publish their critical data Percent of total collection (from 1M+ datasets)

3 Diverse Approaches to Open Gov't Data 3 Government data initiatives have taken many forms GovData portals are widely varied in how they help users discover and use relevant datasets Percent of total catalogs (from 192 catalogs)

4 Federated Discovery of Government Data 4 Stakeholders have seen the need for Federated discovery across catalogs, especially from within major search engines including Bing, Google, Yahoo! and Yandex

5 Government Data in the linked open data cloud http://linkeddata.org/ Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)

6 Linked Data is Not Enough... 6 Publishing open government data as Linked Data is not enough For OGD to be useful, datasets must be published using metadata, markup standards and presentation that aid discovery and use

7 Linked Data is Not Enough... 7 Publishing open government data as Linked Data is not enough For OGD to be useful, datasets must be published using metadata, markup standards and presentation that aid discovery and use

8 Dataset Metadata for Discovery and Use 8 Recent work at TWC RPI demonstrates the value of applying emerging standards for uniformly describing government datasets and catalogs

9 International Open Government Dataset Search 9 TWC's IOGDS application is an aggregated catalog of more than 1M datasets from over 192 dataset catalogs from governments at every level around the world See: http://logd.tw.rpi.edu

10 10 Anticipates W3C DCAT RDF vocabulary Demos what a comprehensive federated catalog based on DCAT and aggregation API might look like International Open Government Dataset Search

11 11 IOGDS is a multi-year effort based on downloading, scraping or accessing APIs, converting metadata to a proto-DCAT model, and publishing via endpoint and download International Open Government Dataset Search API Download Web IOGDS Workflow IODGS CSV Per-site scraper code ad hoc code Csv2rdf4lod automation 11 Catalogs See: http://logd.tw.rpi.edu

12 Schema.org: Semantic Markup for Discovery 12 TWC RPI has published dataset listings based on IOGDS using emerging microdata standards, esp. schema.org model endorsed by Bing, Google, Yahoo!, Yandex...

13 Schema.org datasets extension 13 TWC RPI's schema.org dataset extension will enable government dataset catalogs to more easily be parsed and indexed by the major search engines......which will help users find relevant datasets! TWC's dataset extension entered public discussion June 2012

14 Schema.org datasets extension 14 The schema.org datasets extension enables relevant datasets to be more easily discovered by a range of stakeholders including researchers, data journalists, bloggers and developers

15 15 Schema.org datasets extension “...we've reviewed the current datasets schema proposal in draft, and we are comfortable with the current state of things... “...At this point, if the group would solidify on the dataset proposal, then Data.gov would support and use it. ---Chris Musialek

16 CKAN Data Catalog Scheme & Protocol 16 API-based catalog federation is also possible ckan announced DCAT-based query/federation API enables OAI-PMH-like harvesting and more

17 Dataset extension to schema.org

18 Demo/ links http://www.w3.org/wiki/WebSchemas/Datasets http://www.w3.org/wiki/WebSchemas/SchemaDot OrgProposals Good introduction (longer/ with more context): http://www.slideshare.net/joshsh/semantic- markup-using-schemaorg

19 Examples of current schema.org results http://schema-creator.org/event.php http://schema-creator.org/product.php

20 To do… Get Google, Bing, Yahoo, … to crawl these pages It might look like this: http://www.google.c om/publicdata/direct ory

21 From Jim Hendler: Google is now building custom search engines that will pull down schema.org Dan Brickley is working on one from the Dataset schema, not yet public There's also an open govt data search – not much in it, but looks nice – it's at http://www.google.com/publicdata/directory

22 Retrieve all the logd datasets: PREFIX dgtwc: PREFIX conv: PREFIX void: PREFIX dcterms: SELECT DISTINCT ?dataset ?catalog ?catalog_id ?title ?desc ?country ?homepage ?agency_id ?contributor_id WHERE { ?dataset a conv:CatalogedDataset. ?dataset void:inDataset ?catalog. ?catalog dcterms:identifier ?catalog_id. ?dataset ?title. ?dataset dcterms:description ?desc. OPTIONAL { ?dataset dgtwc:catalog_country ?country. } OPTIONAL { ?dataset ?homepage. } OPTIONAL { ?dataset dgtwc:agency ?agency. ?agency dcterms:identifier ?agency_id. } OPTIONAL { ?dataset ?contributor. ?contributor dcterms:identifier ?contributor_id. } #?dataset dgtwc:catalog_country. } Courtesy: Josh Shinavier (RPI/TWC)

23 A large number of datasets: http://logd.tw.rpi.edu/schemaorg_dataset_extensio n http://www.google.com/webmasters/tools/richsnip pets?url=http://logd.tw.rpi.edu/schemaorg_datas et_extension&view=

24 http://logd.tw.rpi.edu/page/internation al_dataset_catalog_search

25 Latest from Josh: Datasets-as-Linked-Data demo. The RDFa in the pages is not only correct w.r.t. schema.org but is also presented in such a way that an RDFa-aware Linked Data crawler can hop from datasets to catalogs, back again, into DBpedia, etc. while gathering the RDFa as linked RDF. Since we now have Datasets-ish RDFa markup in the main IOGDS dataset pages (i.e. the pages which the URIs of the datasets redirect to), we're pretty close to a completely integrated demo. What remains: (1) the current markup has some problems. We need to fix those; (2) we need markup for catalogs as well as datasets…

26 Needed (1) and (2): To fix (1), we need to make changes to the LODSPeaKr templates that automatically generate those pages, to make them compliant with the model Josh developed. To fix (2), we'll work with Alvaro (Graves) to create LODSPeaKr- based automation to generate catalog pages in an efficient way. (2) presents more of a challenge than (1) at this point, since the IOGDS implementation of dataset details pages is mostly correct at this point. Still need Dan B. to assist with getting them found…

27 What we need: Willingness to adopt the dataset schema extension – we need lots of datasets to start showing up We (TWC) will be pushing out some tools, more demos and how-tos, very soon Wanna play? http://wiki.esipfed.org/index.php/DatasetSchema


Download ppt "First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer."

Similar presentations


Ads by Google