Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,

Slides:



Advertisements
Similar presentations
Mashing Up Linked Open Government Data Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute Nov 8, 2010.
Advertisements

RPI Li Ding, Jim Hendler and Deborah McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute July 27, 2010 The Data-gov.
Digital Repositories – Linked Open Data – the possible Role of D4Science Workshop, December 2010, FAO use cases A tool to create Linked Data providers.
Semantic Web Introduction
JavaScript and Data Visualization Dominic DiFranzo.
A BRIEF INTRO TO THE PROV DATA MODEL Simon Miles The entire W3C Provenance Working Group.
(1) Standardizing for Open Data Ivan Herman, W3C Open Data Week Marseille, France, June Slides at:
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
Data-gov Demos Jim Hendler, Li Ding and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute April 15, 2010
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Semantic Web Bootcamp Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Linked Data Visualizations for Eurostat Linked Data Dr. Brand Niemann Director and Senior Data Scientist Semantic Community
WP 5 Data management & analysis Michel Bohms and Philomena M. Bluyssen – TNO Isabella Annesi-Maesano - UMPC Paris 06 Aileen Yang and Alena Bartonova –
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
The Linked Government Data Landscape Today data.gov and TWC LOGD Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer.
Michalis Vafopoulos NTUA, GFOSS & The transformers GREEN CITY HACKATHON.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Big Data Symposium: Analytics and Applications for Federal Big Data - FEMA Dr. Brand Niemann Director and Senior Enterprise Architect – Data Scientist.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
Mash-up of Linked Government Data from Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.
SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,
The Semantic Web Web Science Systems Development Spring 2015.
SWWG PROJECT OVERVIEW Semantic Technologies for Integrating USGS Data.
First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer.
TWC LOGD: A Portal for Linking Open Government Data Li Ding, Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Rensselaer Polytechnic Institute.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
Linking Open Government Data (TWC LOGD) Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute.
Tetherless World Constellation Semantic Web Science Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Semantic Web Basics Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:
The Semantic Logger: Supporting Service Building from Personal Context Mischa M Tuffield et al. Intelligence, Agents, Multimedia Group University of Southampton.
Introduction to the Semantic Web and Linked Data
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Toward a framework for statistical data integration Ba-Lam Do, Peb Ruswono Aryan, Tuan-Dat Trinh, Peter Wetz, Elmar Kiesling, A Min Tjoa Linked Data Lab,
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
TWC LOGD: A Portal for Linking Open Government Data Dominic DiFranzo, Li Ding, John S. Erickson, Xian Li, Tim Lebo, James Michaelis, Alvaro Graves, Gregory.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Lessons learned from Semantic Wiki Jie Bao and Li Ding June 19, 2008.
Tetherless World Constellation Web 3.0 emerges… Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
1 Mashup Workflow. 2 What We Have 3 Challenges with REST APIs * Only ask what its built to answer * No standard - must relearn each time * Opaque - no.
Linked Data Competency Index
<Panel: The Art & Science of Data Visualization>
Top US Government Data Resources
Data.gov: Web, Data Web, Social Data Web 7/22/2010 #health2stat.
Middleware independent Information Service
Government Linked Data Sandro Hawke, W3C eGovernment Activity W3C AC Meeting March 22, 2010 This is just a generic slide.
<Panel: The Art & Science of Data Visualization>
Rensselaer Polytechnic Institute
Database Systems Instructor Name: Lecture-3.
W3C Recommendation 17 December 2013 徐江
Presentation transcript:

Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7, 2009

Government Data on the Web

Objectives Investigate the role of semantic web in producing, processing and utilizing government datasets –To enrich the value of data via normalizing, linking and information-extraction –To realize the value of data via applications, esp. visualization –To support web developers via machine friendly data access and web services

Data Processors (Web Services & Analyzers) Data Processors (Web Services & Analyzers) SPARQL Web Service XSLT ServiceDiff Service RDF/XML RSS Generator SPARQL End Point Linked Data Linked Data GOV data (RDF) Google VizMIT ExhibitRSS 1.0 tagCloud … CSV XSL … Tabulator Convert Data Link & Enrich Data View & Use Data Link Annotator RDF/XML Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug · Sem Wiki Semantic Web Architecture for Government Data

Translate GOV data into RDF Principle 1: Keep the translation minimal –keep table structure –skip parsing values, unique property namespace Principle 2: Let the translation meet the Web –RDF/XML as output –Partition of big dataset, dereferenable URI Principle 3: Make the translation extensible –Property definition updatable via Semantic MediaWiki Principle 4: Preserve knowledge provenance –Recording provenance metadata using DC and FOAF Dominic

Translated Dataset Statistics data.gov hosts 432 Datasets: –390 “Raw Data Catalog” and 41 “Tool Catalog” –from 37 US government agencies We have 16 translated RDF datasets –13,532,385 table entries –2,927,399,269 triples. –2,526 properties. data.gov mentioned 458 data access points (mainly tables) –3 - RSS,ATOM –248 - csv/txt –46 – xml –66 - xls (MS Excel) –14 - kml or kmz –22 ESRI shape

(#10) Residential Energy Consumption Survey (#401) Budget Authority and offsetting receipts (#403) Governmental Receipts (#402) Outlays and offsetting receipts (#249) 2006 Toxics Release Inventory (#90) ACS PUMS Housing (#191) 2005 Toxics Release Inventory (#91) ACS PUMS Population (#34) Worldwide M1+ Earthquakes past 7 days (#9) CASTNET Visibility (#397) 2007 Toxics Release Inventory (#8) CASTNET Ozone Budget Population Energy and Utilities Geography and Environment CASTNET sites Cloud of government data Li Ding, Dominic DiFranzo, Sarah Magidson, and Jim Hendler · Tetherless World Constellation · Rensselaer Polytechnic Institute · Aug ·

Issues in Data.gov Duplicated Datasets- Some datasets are part of another dataset –Dataset 140 (2005 Toxics Release Inventory data for the state of California (EPA)) is a subset of Dataset 191. Formatting Issues - The format of some datasets is not friendly to machine processing. –Dataset 37 (Lower Colorado River Daily Average Water Elevations and Releases (US Bureau of Reclamation)). –Dataset 335 (National Longitudinal Surveys (US Bureau of Labor Statistics)) tells you how to order data from the government. Access Point Issues - The access points are interactive webpage which is not friendly for machine access. –Dataset 330 (Local Area Unemployment Statistics (US Bureau of Labor Statistics) Sarah

Demos Visualization –Tabulator –Google Visualization (live) –Exhibit (live) Computation –RSS generation –TDB query (live) Live Demos: – – Dominic, Sarah

TODO List More demos –US Pollution Map –US agency –Earthquake in RPI Map Getting more data linked –Link properties –Link instance data More web services –Gov data auto-completion SPARQL integration for 2B triples –TDB –4Store (#9) CASTNET Visibility (#8) CASTNET Ozone CASTNET sites

Sample SPARQL queries List datasets: –SELECT ?s ?o WHERE {?s ?o } List all loaded documents: –SELECT ?s ?o WHERE {?s } List description about a EPA site (integration) –select ?s WHERE {?s "SHN418". } List contributions of agency (count) –PREFIX dgp92: SELECT ?ag count(*) WHERE { ?entry dgp92:agency ?ag. } GROUP BY ?ag ORDER BY ?ag List agencies (distinct) –PREFIX dgp401: SELECT distinct ?ag ?ag_code ?branch ?branch_code WHERE { ?entry dgp401:bureau_name ?ag; dgp401:bureau_code ?ag_code; dgp401:agency_name ?branch; dgp401:agency_code ?branch_code. } ORDER BY ?ag