Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:

Slides:



Advertisements
Similar presentations
Mashing Up Linked Open Government Data Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute Nov 8, 2010.
Advertisements

RPI Li Ding, Jim Hendler and Deborah McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute July 27, 2010 The Data-gov.
Maines Sustainability Solutions Initiative (SSI) Focuses on research of the coupled dynamics of social- ecological systems (SES) and the translation of.
(1) Standardizing for Open Data Ivan Herman, W3C Open Data Week Marseille, France, June Slides at:
Build VIVO in the Cloud NIH Workshop on Value Added Services for VIVO Brand Niemann Semantic Community March 25-26,
Data-gov Demos Jim Hendler, Li Ding and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute April 15, 2010
Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal Ping Wang 1 Jin Guang.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,
Enterprise Linked Data Seán O’Riain Domain of eBusiness Digital Enterprise Research Institute - National University of Ireland, Galway  Copyright 2010.
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah.
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
EGHNA Development and Support. Agenda  About EGHNA  About Drupal  Who is using Drupal?  What you can do with Drupal  Why use Drupal?  Project Deliverables.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
The Linked Government Data Landscape Today data.gov and TWC LOGD Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Mash-up of Linked Government Data from Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li, Deborah L. McGuinness and Jim Hendler.
1 Identity and Transparency ( Bridging the GAPS of Governance Bridging the GAPS of Governance in eGov Initiatives in eGov Initiatives )‏ Badri Sriraman.
SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
First they have to find it: Getting Government Data Discovered and Used Adapted from: John S. Erickson, Ph.D. Tetherless World Constellation Rensselaer.
TWC LOGD: A Portal for Linking Open Government Data Li Ding, Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Rensselaer Polytechnic Institute.
IPlant Collaborative Tools and Services Workshop iPlant Collaborative Tools and Services Workshop Collaborating with iPlant.
A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Future Learning Landscapes Yvan Peter – Université Lille 1 Serge Garlatti – Telecom Bretagne.
Linking Tasks, Data, and Architecture Doug Nebert AR-09-01A May 2010.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
1 Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI CSCI ITWS ITWS TA: Justin.
TWC Deep Earth Computer: A Platform for Linked Science of the Deep Carbon Observatory Community Xiaogang (Marshall) Ma, Yu Chen, Han Wang, Patrick West,
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics.
Linking Open Government Data (TWC LOGD) Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute.
Tetherless World Constellation Semantic Web Science Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Using Open Data to Create Value for Citizens. Data.gov Provides instant access to ~400,000 datasets in easy to use formats Contributions from UN, World.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
1/6/2016Cyber SMW developers meetup1 Semantic RPI Jie Bao and Li Ding Tetherless World Constellation Rensselaer Polytechnic Institute April 2, 2009.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
Knowledge Modeling and Discovery. About Thetus Thetus develops knowledge modeling and discovery infrastructure software for customers who: Have high-value.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
KAnOE: Research Centre for Knowledge Analytics and Ontological Engineering Managing Semantic Data NACLIN-2014, 10 Dec 2014 Dr. Kavi Mahesh Dean of Research,
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
TWC LOGD: A Portal for Linking Open Government Data Dominic DiFranzo, Li Ding, John S. Erickson, Xian Li, Tim Lebo, James Michaelis, Alvaro Graves, Gregory.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
Driving Innovation with Open Data Chris Musialek in place for Jeanne Holm Data.gov February 9, 2012.
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
INCREASING PUBLIC VALUE THROUGH OPEN GOVERNMENT AND OPEN DATA Barbara-Chiara Ubaldi Project Manager, E-government Public Sector Reform Directorate for.
SysML v2 Model Interoperability & Standard API Requirements Axel Reichwein Consultant, Koneksys December 10, 2015.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
<Panel: The Art & Science of Data Visualization>
Cloud based linked data platform for Structural Engineering Experiment
Data.gov: Web, Data Web, Social Data Web 7/22/2010 #health2stat.
<Panel: The Art & Science of Data Visualization>
LOD reference architecture
Bird of Feather Session
Presentation transcript:

Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team: logd.tw.rpi.edu particularly John Erickson, Tim Lebo, Dominic DiFranzo;, Alvaro Graves; Gregory Williams; Xian Li; James Michaelis; Jin Zheng; Zhenning Shangguan; Johanna Flores, Evan Patton Tetherless World Constellation, Rensselaer Polytechnic Institute SemTech 2011 San Francisco June 7, 2011

Outline Open Government Data Linked Open Government Data Challenges and Opportunities Future Directions

Open Government Data: Government data is already available and open on the Web and is growing. Let’s create mash ups to expose more value. ?

Opening Government Data “Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” --- President Obama (Jan 2009) “if people put data onto the web -- government data, scientific data, community data, whatever it is -- it will be used by other people to do wonderful things, in ways that they never could have imagined.” -- Tim Berners-Lee (Feb 2010) Source: Linked Data and Semantic Tech are key enabler!

International Open Government Data: A Great Opportunity 13 Other nations establishing open data 24 States now offering data sites 11 Cities in America with open data 236 New applications from Data.gov datasets 258 Data contacts in Federal Agencies 308,650 Datasets available on Data.gov Open Government Data (OGD) –A public asset (collected by government) with a large amount of high value data and wide domain coverage –An international mandate for government transparency, business applications, citizen participation, and etc. Deployment Status (source: Data.gov) Source:

Challenges from Raw Open Government Data Data in proprietary formatsIndependent curators Distributed and unlinked Data Smoke rate (Impacteen.org) Policy coverage (NCI) Limited Participation

Linked Open Government Data TWC: Tetherless World Constellation at Rensselaer Polytechnic Institute logd.tw.rpi.edu LOGD: Linked Open Government Data

Linked Data is Large and is Growing 8

The Tetherless World Constellation Linked Open Govt Data Portal 9 Create TWC LOGD Convert Query/ Access LOGD SPARQL Endpoint Enhance RDF RSS JSON XML HTML CSV … Community Portal Data.gov deployment

Linked Open Government Data A Linked Open Government Data (LOGD) ecosystem is a Linked Data-based system where stakeholders of different sizes and roles find, manage, archive, publish, reuse, integrate, mash-up, and consume open government data in connection with online tools, services and societies.

Moving data.gov to linked data (US) Third parties (like RPI) translate the government datasets into linked data formats US Data.gov hosts 6.4B RDF triples 5/21/2010 acknowledges Semantic Web as a key technology for open government data

Government Data within the LD Cloud 12 Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)

TWC LOGD: 50+ Demos in Many Domains using Various Technologies Technology Semantic Web Semantic CMS Semantic Search Social network NLP Mobile Visualization Provenance … Domain Health Finance Politics Society Economy …

Selected TWC Mashups

Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices ( ) PopSciGrid with NIH/NCI & Northwestern Aimed at conveying complex health-related information to consumers and health decision makers  Diverse datasets from NIH  Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective  Maintains provenance about data and manipulations  Two-way communication: Feedback users’ comments to gov contacts (e.g. %)

PopSciGrid Workflow Convert Enhance Visualize derive create Integrate Ban coverage Publish

The Abstract LOGD Workflow 17 Visualize End User End User Gov Agency Gov Agency Mashed Data Mashed Data LOGD RAW OGD RAW OGD Enhance Integrate Publish Convert Developer Usability of LOGD Interoperability Scalability Provenance Mashup Workflow (Conventional OGD) 1.Publish 2.Mashup 3.Visualize Mashup Workflow (Conventional OGD) 1.Publish 2.Mashup 3.Visualize

Challenge: Interoperability ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g., CSV instead of Excel) ★★★★ use URIs to identify things, so that people can point at your stuff ★★★★★ link your data to other data to provide context Syntactic Extract entities from HTML tables Parse Excel tables Semantic Does “Georgia” refer to a US state or a country? Is “2000” calendar year, fiscal year or dollar amount? TBL’s 5-star Deployment Scheme for Linked Data

Mashing up data from different countries

Even if not “rationalized” together Build ontology mapping based on shared terms “Economic”

Enhance interoperability using Linked Data: drill down contextual knowledge 21 Identity : URI Context –Description: metadata, esp. type & datatype –Mapping (linking identities) Syntactic –Common string name –Common URI Semantic –Complex Object: attributes + context (siblings) –Ontological Mapping: e.g., owl:sameAs –Rule-based Mapping: e.g. mapping “Liter” to “Gallon”

Scalability factors in LOGD deployment Large number of OGD datasets – 6k+ Data.gov.uk –200k+ Data.gov –323k+ International OGD datasets Non-trivial human workload: clean-up syntax, enhance semantics, integrate datasets, visualize resulting data … Substantial computing workload: running time of complex tasks, memory and disk space, maintenance costs … 22

International catalog 23

Scalability issues in the International Open Government Dataset Catalog 24 Crawled 40+ different dataset catalogs from 19 countries “non-trivial customized programming workload” Searching 323,304 datasets “Complex SPARQL query got timeout” Social Aspect Computing Aspect International Open Government Dataset Catalog International Open Government Dataset Catalog

Social Aspect: Distribute human workload to the right developers 25 Domain Expertise Application Development Expertise Joint work with Alvaro Graves, PhD student at RPI Layman End Users Software Engineers Scientists, Experts Genus Students Convert Enhance Combine Knowledge Engineers Visualize Publish 1.Decompose workload to fine-granular jobs 2.Leverage a wider range of developers 1.Decompose workload to fine-granular jobs 2.Leverage a wider range of developers

Computing Aspect: fit computing power to LOGD deployment Scale up for more government data –Support collective incremental data processing –Support large scale data analysis: graph connectivity, complex pattern/hypotheses discovery –Map repetitive developers’ workload to automated tools –Reduce service maintenance costs Scale down for wider range of end user apps –Limited computing power, e.g., mobile devices –End users’ cognitive constraints, e.g., screen-size, executive summary 26

Provenance Provenance-aware frameworks are needed to support transparency, appropriate attribution, and ultimately trust of any kind of open data. Versioning and persistence are important factors to sustainable applications Workflow provenance can help increase understanding and trust since it can be used to explain behavior and dependencies of intelligent systems 27

Attribution in PopSciGrid demo person technology dataset agency version conversion logd:uses_technology dcterms:contributor Example scenarios List direct/indirect contributors End users send feedback to curators Curators learn usage of datasets List demos by technology void:subset dcterms:publisher logd:uses_dataset State-wise Tobacco Policy coverage stats

TWC Semantic Water Quality Portal Aimed at helping people investigate local water quality  Diverse datasets, regulations, datatypes  Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective  Maintains provenance about data and manipulations  Exposes unexpected uses of data (and thus unexpected usage patterns)

Detailed View of Pollution

Provenance of regulations

Challenges Revisited Interoperability –Syntactic: Linked Data, RDF –Semantic: ontology, evolving Scalability (9.9 billion triples on the TWC LOGD) –Effective Social platform for task dispatching –More automations, e.g., data cleaning, and linked detection –Scalable tools, esp. SPARQL endpoint Provenance –Accountability: Privacy, licensing, trust –Credit / Blame –Replicate applications and transfer system building knowledge More issues –Persistent data access for changing data –… 32

Summary The Open Government data is a key resource –Many governments releasing data, growing number in structured form Government (and general data) transparency comes through in the “mashing up” of data from many sites maintaining (and exposing) provenance –Key to linked data While there has been tremendous progress, many challenges remain –Trust, Provenance, Scaling, Interoperability, Archiving, Curation, … The Research agenda for linked government data is an important driving area for semantic technologies

Questions? The work presented in this talk was primarily conducted at the Tetherless World Constellation at Rensselaer Polytechnic Institute. Comments / Questions: [ dingl | dlm cs.rpi.edu. Events: Open Linked Govt. Data Symposium: submission deadline June TWC / Elsevier Hackathon: June Reference: Li Ding, Timothy Lebo, John S. Erickson, Dominic DiFranzo, Gregory Todd Williams, Xian Li, James Michaelis, Alvaro Graves, Jin Guang Zheng, Zhenning Shangguan, Johanna Flores, Deborah L. McGuinness and Jim Hendler, TWC LOGD: A Portal for Linked Open Government Data Ecosystems, submitted to JWS, special issue on semantic web challenge’10