Presentation is loading. Please wait.

Presentation is loading. Please wait.

Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:

Similar presentations


Presentation on theme: "Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:"— Presentation transcript:

1 Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team: logd.tw.rpi.edu particularly John Erickson, Tim Lebo, Dominic DiFranzo;, Alvaro Graves; Gregory Williams; Xian Li; James Michaelis; Jin Zheng; Zhenning Shangguan; Johanna Flores, Evan Patton Tetherless World Constellation, Rensselaer Polytechnic Institute SemTech 2011 San Francisco June 7, 2011

2 Outline Open Government Data Linked Open Government Data Challenges and Opportunities Future Directions

3 Open Government Data: Government data is already available and open on the Web and is growing. Let’s create mash ups to expose more value. ?

4 Opening Government Data “Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” --- President Obama (Jan 2009) “if people put data onto the web -- government data, scientific data, community data, whatever it is -- it will be used by other people to do wonderful things, in ways that they never could have imagined.” -- Tim Berners-Lee (Feb 2010) Source: http://www.whitehouse.gov/open, http://www.ted.com/talks/lang/eng/tim_berners_lee_the_year_open_data_went_worldwide.htmlhttp://www.whitehouse.gov/openhttp://www.ted.com/talks/lang/eng/tim_berners_lee_the_year_open_data_went_worldwide.html Linked Data and Semantic Tech are key enabler!

5 International Open Government Data: A Great Opportunity 13 Other nations establishing open data 24 States now offering data sites 11 Cities in America with open data 236 New applications from Data.gov datasets 258 Data contacts in Federal Agencies 308,650 Datasets available on Data.gov Open Government Data (OGD) –A public asset (collected by government) with a large amount of high value data and wide domain coverage –An international mandate for government transparency, business applications, citizen participation, and etc. Deployment Status (source: Data.gov) Source: http://www.data.gov/http://www.data.gov/

6 Challenges from Raw Open Government Data Data in proprietary formatsIndependent curators Distributed and unlinked Data Smoke rate (Impacteen.org) Policy coverage (NCI) Limited Participation

7 Linked Open Government Data TWC: Tetherless World Constellation at Rensselaer Polytechnic Institute logd.tw.rpi.edu LOGD: Linked Open Government Data

8 Linked Data is Large and is Growing 8

9 The Tetherless World Constellation Linked Open Govt Data Portal 9 Create TWC LOGD Convert Query/ Access LOGD SPARQL Endpoint Enhance RDF RSS JSON XML HTML CSV … Community Portal Data.gov deployment

10 Linked Open Government Data A Linked Open Government Data (LOGD) ecosystem is a Linked Data-based system where stakeholders of different sizes and roles find, manage, archive, publish, reuse, integrate, mash-up, and consume open government data in connection with online tools, services and societies.

11 Moving data.gov to linked data (US) Third parties (like RPI) translate the government datasets into linked data formats US Data.gov hosts 6.4B RDF triples 5/21/2010 acknowledges Semantic Web as a key technology for open government data

12 Government Data within the LD Cloud 12 http://linkeddata.org/ Government Data is currently over ½ the cloud in size (~17B triples), 10s of thousands of links to other data (within and without)

13 TWC LOGD: 50+ Demos in Many Domains using Various Technologies Technology Semantic Web Semantic CMS Semantic Search Social network NLP Mobile Visualization Provenance … Domain Health Finance Politics Society Economy …

14 Selected TWC Mashups

15 Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007) PopSciGrid with NIH/NCI & Northwestern Aimed at conveying complex health-related information to consumers and health decision makers  Diverse datasets from NIH  Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective  Maintains provenance about data and manipulations  Two-way communication: Feedback users’ comments to gov contacts (e.g. %)

16 PopSciGrid Workflow Convert Enhance Visualize derive create Integrate Ban coverage Publish

17 The Abstract LOGD Workflow 17 Visualize End User End User Gov Agency Gov Agency Mashed Data Mashed Data LOGD RAW OGD RAW OGD Enhance Integrate Publish Convert Developer Usability of LOGD Interoperability Scalability Provenance Mashup Workflow (Conventional OGD) 1.Publish 2.Mashup 3.Visualize Mashup Workflow (Conventional OGD) 1.Publish 2.Mashup 3.Visualize

18 Challenge: Interoperability ★ make your stuff available on the Web (whatever format) under an open license ★★ make it available as structured data (e.g., Excel instead of image scan of a table) ★★★ use non-proprietary formats (e.g., CSV instead of Excel) ★★★★ use URIs to identify things, so that people can point at your stuff ★★★★★ link your data to other data to provide context Syntactic Extract entities from HTML tables Parse Excel tables Semantic Does “Georgia” refer to a US state or a country? Is “2000” calendar year, fiscal year or dollar amount? TBL’s 5-star Deployment Scheme for Linked Data

19 Mashing up data from different countries http://data-gov.tw.rpi.edu/demo/USForeignAid/demo-1554.html

20 Even if not “rationalized” together Build ontology mapping based on shared terms “Economic”

21 Enhance interoperability using Linked Data: drill down contextual knowledge 21 Identity : URI Context –Description: metadata, esp. type & datatype –Mapping (linking identities) Syntactic –Common string name –Common URI Semantic –Complex Object: attributes + context (siblings) –Ontological Mapping: e.g., owl:sameAs –Rule-based Mapping: e.g. mapping “Liter” to “Gallon”

22 Scalability factors in LOGD deployment Large number of OGD datasets – 6k+ Data.gov.uk –200k+ Data.gov –323k+ International OGD datasets Non-trivial human workload: clean-up syntax, enhance semantics, integrate datasets, visualize resulting data … Substantial computing workload: running time of complex tasks, memory and disk space, maintenance costs … 22

23 International catalog 23

24 Scalability issues in the International Open Government Dataset Catalog 24 Crawled 40+ different dataset catalogs from 19 countries “non-trivial customized programming workload” Searching 323,304 datasets “Complex SPARQL query got timeout” Social Aspect Computing Aspect International Open Government Dataset Catalog International Open Government Dataset Catalog

25 Social Aspect: Distribute human workload to the right developers 25 Domain Expertise Application Development Expertise Joint work with Alvaro Graves, PhD student at RPI Layman End Users Software Engineers Scientists, Experts Genus Students Convert Enhance Combine Knowledge Engineers Visualize Publish 1.Decompose workload to fine-granular jobs 2.Leverage a wider range of developers 1.Decompose workload to fine-granular jobs 2.Leverage a wider range of developers

26 Computing Aspect: fit computing power to LOGD deployment Scale up for more government data –Support collective incremental data processing –Support large scale data analysis: graph connectivity, complex pattern/hypotheses discovery –Map repetitive developers’ workload to automated tools –Reduce service maintenance costs Scale down for wider range of end user apps –Limited computing power, e.g., mobile devices –End users’ cognitive constraints, e.g., screen-size, executive summary 26

27 Provenance Provenance-aware frameworks are needed to support transparency, appropriate attribution, and ultimately trust of any kind of open data. Versioning and persistence are important factors to sustainable applications Workflow provenance can help increase understanding and trust since it can be used to explain behavior and dependencies of intelligent systems 27

28 Attribution in PopSciGrid demo person technology dataset agency version conversion logd:uses_technology dcterms:contributor Example scenarios List direct/indirect contributors End users send feedback to curators Curators learn usage of datasets List demos by technology void:subset dcterms:publisher logd:uses_dataset State-wise Tobacco Policy coverage stats

29 TWC Semantic Water Quality Portal Aimed at helping people investigate local water quality  Diverse datasets, regulations, datatypes  Uses lightweight semantic technologies to produce mashups that make data accessible that would be otherwise difficult to view in perspective  Maintains provenance about data and manipulations  Exposes unexpected uses of data (and thus unexpected usage patterns)

30 Detailed View of Pollution

31 Provenance of regulations

32 Challenges Revisited Interoperability –Syntactic: Linked Data, RDF –Semantic: ontology, evolving Scalability (9.9 billion triples on the TWC LOGD) –Effective Social platform for task dispatching –More automations, e.g., data cleaning, and linked detection –Scalable tools, esp. SPARQL endpoint Provenance –Accountability: Privacy, licensing, trust –Credit / Blame –Replicate applications and transfer system building knowledge More issues –Persistent data access for changing data –… 32

33 Summary The Open Government data is a key resource –Many governments releasing data, growing number in structured form Government (and general data) transparency comes through in the “mashing up” of data from many sites maintaining (and exposing) provenance –Key to linked data While there has been tremendous progress, many challenges remain –Trust, Provenance, Scaling, Interoperability, Archiving, Curation, … The Research agenda for linked government data is an important driving area for semantic technologies

34 Questions? The work presented in this talk was primarily conducted at the Tetherless World Constellation at Rensselaer Polytechnic Institute. Comments / Questions: [ dingl | dlm ] @ cs.rpi.edu. Events: Open Linked Govt. Data Symposium: submission deadline June 15 http://tw.rpi.edu/web/event/AAAI/2011/Fall_Symposium_OGK http://tw.rpi.edu/web/event/AAAI/2011/Fall_Symposium_OGK TWC / Elsevier Hackathon: June 27-28 http://tw.rpi.edu/web/event/TWCElsevierHackathonJune2011 Reference: Li Ding, Timothy Lebo, John S. Erickson, Dominic DiFranzo, Gregory Todd Williams, Xian Li, James Michaelis, Alvaro Graves, Jin Guang Zheng, Zhenning Shangguan, Johanna Flores, Deborah L. McGuinness and Jim Hendler, TWC LOGD: A Portal for Linked Open Government Data Ecosystems, submitted to JWS, special issue on semantic web challenge’10


Download ppt "Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:"

Similar presentations


Ads by Google