First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer.

Slides:



Advertisements
Similar presentations
The Institute for Learning and Research Technology is a national centre of excellence in the development and use of technology-based methods in teaching,
Advertisements

Schema.org, an ontology for discovery on the web Phil Barker, Heriot-Watt University
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
The CERIF-2000 Implementation. Andrei S. Lopatenko CERIF Implementation Guidelines Andrei Lopatenko Vienna University of Technology
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Metadata Standards & Applications 7. Approaches to Models of Metadata Creation, Storage, and Retrieval.
1 Semantic Data Management Xavier Lopez, Ph.D., Director, Spatial & Semantic Technologies.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Microdata for Dallas County Historical & Genealogical Cemetery Data Tony Hanson Webmaster 1.
Educause October 29, 2001 A GEM of a Resource: The Gateway to Educational Materials Copyright Nancy Virgil Morgan, This work is the intellectual.
Exchange formats and APIs Questions – how and when to access metadata? – lifecycle/status – how to access? can things disappear? – is CSV enough? – is.
Project Report Presentation and Update October 10, 2014 Jeff Mixter - OCLC Research Patrick OBrien - Montana State Univeristy Kenning Arlitsch - Montana.
CSE 428 Semantic Web Topics Introduction Jeff Heflin Lehigh University.
Networking Session: Global Information Structures for Science & Cultural Heritage - The Interoperability Challenge «INTEROPERABILITY FROM THE CULTURAL.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
PREMIS Tools and Services Rebecca Guenther Network Development & MARC Standards Office, Library of Congress NDIIPP Partners Meeting July 21,
The Linked Government Data Landscape Today data.gov and TWC LOGD Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer.
Mash-up of Linked Government Data from Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Semantic Web Applications GoodRelations BBC Artists BBC World Cup 2010 Website Emma Nherera.
References: [1] Branch, B.D., Fosmire, M., The role of interdisciplinary GIS and data curation librarians in enhancing authentic scientific research.
Open DATA METI: All Content As Big Data Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist Semantic Community
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Grid Computing & Semantic Web. Grid Computing Proposed with the idea of electric power grid; Aims at integrating large-scale (global scale) computing.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
Linking Open Government Data (TWC LOGD) Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
Tetherless World Constellation Semantic Web Science Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Semantic Web Basics Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Harnessing Health.Data.gov Data to Address Diabetes in the US Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Using Open Data to Create Value for Citizens. Data.gov Provides instant access to ~400,000 datasets in easy to use formats Contributions from UN, World.
 Structured Data An Introduction to Semantic Web “It is very hard for search engines to understand the structure and semantics of data embedded in an.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC LOGD: A Portal for Linking Open Government Data Dominic DiFranzo, Li Ding, John S. Erickson, Xian Li, Tim Lebo, James Michaelis, Alvaro Graves, Gregory.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Tetherless World Constellation Web 3.0 emerges… Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Developing our Metadata: Technical Considerations & Approach Ray Plante NIST 4/14/16 NMI Registry Workshop BIPM, Paris 1 …don’t worry ;-) or How we concentrate.
<Panel: The Art & Science of Data Visualization>
Presented at Archives Records 2016, session 510
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Lifting Data Portals to the Web of Data
NISO Virtual Conference 19 February 2014 Ralph Swick, W3C
Metadata Quality: Learning from Open Data Portalwatch
<Panel: The Art & Science of Data Visualization>
Rensselaer Polytechnic Institute
PREMIS Tools and Services
Google Dataset Search Evaluation
LOD reference architecture
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
W3C Recommendation 17 December 2013 徐江
Future Development Plans
Australian and New Zealand Metadata Working Group
Presentation transcript:

First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer Polytechnic Institute Troy, New York, USA #TWCRPI

Open Government Data Around the World 2 Starting with efforts in the US and UK, governments around the world have recognized the need to publish their critical data Percent of total collection (from 1M+ datasets)

Diverse Approaches to Open Gov't Data 3 Government data initiatives have taken many forms GovData portals are widely varied in how they help users discover and use relevant datasets Percent of total catalogs (from 192 catalogs)

Federated Discovery of Government Data 4 Stakeholders have seen the need for Federated discovery across catalogs, especially from within major search engines including Bing, Google, Yahoo! and Yandex

Government Data in the linked open data cloud Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)

Linked Data is Not Enough... 6 Publishing open government data as Linked Data is not enough For OGD to be useful, datasets must be published using metadata, markup standards and presentation that aid discovery and use

Linked Data is Not Enough... 7 Publishing open government data as Linked Data is not enough For OGD to be useful, datasets must be published using metadata, markup standards and presentation that aid discovery and use

Dataset Metadata for Discovery and Use 8 Recent work at TWC RPI demonstrates the value of applying emerging standards for uniformly describing government datasets and catalogs

International Open Government Dataset Search 9 TWC's IOGDS application is an aggregated catalog of more than 1M datasets from over 192 dataset catalogs from governments at every level around the world See:

10 Anticipates W3C DCAT RDF vocabulary Demos what a comprehensive federated catalog based on DCAT and aggregation API might look like International Open Government Dataset Search

11 IOGDS is a multi-year effort based on downloading, scraping or accessing APIs, converting metadata to a proto-DCAT model, and publishing via endpoint and download International Open Government Dataset Search API Download Web IOGDS Workflow IODGS CSV Per-site scraper code ad hoc code Csv2rdf4lod automation 11 Catalogs See:

Schema.org: Semantic Markup for Discovery 12 TWC RPI has published dataset listings based on IOGDS using emerging microdata standards, esp. schema.org model endorsed by Bing, Google, Yahoo!, Yandex...

Schema.org datasets extension 13 TWC RPI's schema.org dataset extension will enable government dataset catalogs to more easily be parsed and indexed by the major search engines......which will help users find relevant datasets! TWC's dataset extension entered public discussion June 2012

Schema.org datasets extension 14 The schema.org datasets extension enables relevant datasets to be more easily discovered by a range of stakeholders including researchers, data journalists, bloggers and developers

15 Schema.org datasets extension “...we've reviewed the current datasets schema proposal in draft, and we are comfortable with the current state of things... “...At this point, if the group would solidify on the dataset proposal, then Data.gov would support and use it. ---Chris Musialek

CKAN Data Catalog Scheme & Protocol 16 API-based catalog federation is also possible ckan announced DCAT-based query/federation API enables OAI-PMH-like harvesting and more

Dataset extension to schema.org

Demo/ links OrgProposals Good introduction (longer/ with more context): markup-using-schemaorg

Examples of current schema.org results

To do… Get Google, Bing, Yahoo, … to crawl these pages It might look like this: om/publicdata/direct ory

From Jim Hendler: Google is now building custom search engines that will pull down schema.org Dan Brickley is working on one from the Dataset schema, not yet public There's also an open govt data search – not much in it, but looks nice – it's at

Retrieve all the logd datasets: PREFIX dgtwc: PREFIX conv: PREFIX void: PREFIX dcterms: SELECT DISTINCT ?dataset ?catalog ?catalog_id ?title ?desc ?country ?homepage ?agency_id ?contributor_id WHERE { ?dataset a conv:CatalogedDataset. ?dataset void:inDataset ?catalog. ?catalog dcterms:identifier ?catalog_id. ?dataset ?title. ?dataset dcterms:description ?desc. OPTIONAL { ?dataset dgtwc:catalog_country ?country. } OPTIONAL { ?dataset ?homepage. } OPTIONAL { ?dataset dgtwc:agency ?agency. ?agency dcterms:identifier ?agency_id. } OPTIONAL { ?dataset ?contributor. ?contributor dcterms:identifier ?contributor_id. } #?dataset dgtwc:catalog_country. } Courtesy: Josh Shinavier (RPI/TWC)

A large number of datasets: n pets?url= et_extension&view=

al_dataset_catalog_search

Latest from Josh: Datasets-as-Linked-Data demo. The RDFa in the pages is not only correct w.r.t. schema.org but is also presented in such a way that an RDFa-aware Linked Data crawler can hop from datasets to catalogs, back again, into DBpedia, etc. while gathering the RDFa as linked RDF. Since we now have Datasets-ish RDFa markup in the main IOGDS dataset pages (i.e. the pages which the URIs of the datasets redirect to), we're pretty close to a completely integrated demo. What remains: (1) the current markup has some problems. We need to fix those; (2) we need markup for catalogs as well as datasets…

Needed (1) and (2): To fix (1), we need to make changes to the LODSPeaKr templates that automatically generate those pages, to make them compliant with the model Josh developed. To fix (2), we'll work with Alvaro (Graves) to create LODSPeaKr- based automation to generate catalog pages in an efficient way. (2) presents more of a challenge than (1) at this point, since the IOGDS implementation of dataset details pages is mostly correct at this point. Still need Dan B. to assist with getting them found…

What we need: Willingness to adopt the dataset schema extension – we need lots of datasets to start showing up We (TWC) will be pushing out some tools, more demos and how-tos, very soon Wanna play?