SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,

Slides:



Advertisements
Similar presentations
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Advertisements

1 OWL Instance Data Evaluation Li Ding, Jiao Tao, and Deborah L. McGuinness Tetherless World Constellation Computer Science Department.
Addressing the Challenges of Multi-Domain Data Integration with the SemantEco Framework Evan W. Patton, Patrice Seyed, Deborah L. McGuinness Presented.
Water Quality Portal Semantic e-Science Evan Patton Jin Guang Zheng Ping Wang Theodora Kampelou 11/22/2010.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
Next Generation Environmental Informatics as exemplified by the Tetherless World Semantic Water Quality Portal Ping Wang 1 Jin Guang.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Jim Hendler Tetherless World Constellation Aug 7,
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Data.gov Wiki: A Semantic Web Approach to Government Data Li Ding, Dominic DiFranzo, Sarah Magidson, Alvaro Graves, James R. Michaelis, Xian Li, Deborah.
RDF: Building Block for the Semantic Web Jim Ellenberger UCCS CS5260 Spring 2011.
Knowledge Provenance in Semantic Wikis Li Ding, Jie Bao, and Deborah McGuinness Tetherless World Constellation, Rensselaer Polytechnic Institute Troy,
Information Fusion: Moving from domain independent to domain literate approaches Professor Deborah L. McGuinness Tetherless World Constellation, Rensselaer.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Mediation & OWS 8 Glenn Guempel
Cloud based linked data platform for Structural Engineering Experiment Xiaohui Zhang
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Rajashree Deka Tetherless World Constellation Rensselaer Polytechnic Institute.
The Linked Government Data Landscape Today data.gov and TWC LOGD Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Mash-up of Linked Government Data from Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation, Rensselaer Polytechnic.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
References: [1] [2] [3] Acknowledgments:
TWC LOGD: A Portal for Linking Open Government Data Li Ding, Deborah L. McGuinness, Jim Hendler Tetherless World Constellation Rensselaer Polytechnic Institute.
A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
1 Advanced Semantic Technologies Prof. Deborah McGuinness and Dr. Patrice Seyed CSCI CSCI ITWS ITWS TA: Justin.
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
SemantEco Annotator for Linked Data Generation and Generalized Semantic Mapping Session: Technologies, Reasoning, and Annotation Methods of the Semantics.
Linking Open Government Data (TWC LOGD) Li Ding, Jim Hendler and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute.
Tetherless World Constellation Semantic Web Science Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
Linked Open Government Data: What’s Next? Li Ding, James A. Hendler, and Deborah L. McGuinness With thanks to the entire RPI Tetherless World LOGD team:
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Hierarchical Search in SemantEco Support Varied Ontology Design Patterns Session: "Semantics for Biodiversity: Interoperability with genomic and ecological.
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
Explainable Adaptive Assistants Deborah L. McGuinness, Tetherless World Constellation, RPI Alyssa Glass, Stanford University Michael Wolverton, SRI International.
Semantic Web Portal: A Platform for Better Browsing and Visualizing Semantic Data Ying Ding et al. Jin Guang Zheng, Tetherless World Constellation.
TWC LOGD: A Portal for Linking Open Government Data Dominic DiFranzo, Li Ding, John S. Erickson, Xian Li, Tim Lebo, James Michaelis, Alvaro Graves, Gregory.
Prizms for Data Publication and Management May 9, 2014 Katie Chastain.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Semantic Water Quality Portal Jin Guang Zheng and Ping Wang Tetherless World Constellation.
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.
Semantic Web Technologies Readings discussion Research presentations Projects & Papers discussions.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
Linking Ontologies to Spatial Databases
Cloud based linked data platform for Structural Engineering Experiment
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Lifting Data Portals to the Web of Data
Existing Designs and Prototypes at RPI
LOD reference architecture
Data Provenance.
Modeling Data Set Versioning Operations
Chaitali Gupta, Madhusudhan Govindaraju
Presentation transcript:

SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano, and Deborah L. McGuinness Tetherless World Constellation Rensselaer Polytechnic Institute Troy, NY, USA

Real Life Motivating Example In 2009, in Bristol County, Rhode Island, children became ill with symptoms such as diarrhea. The cause was found to be polluted water (E. Coli) and citizens were asked to boil water until the issue was resolved. Public concerns: o “When did the contamination begin?” o “How did this happen?” o “How can we keep it from happening again?”

Challenges 1.Raw data from multiple sources and in different formats – difficult to integrate and query. 2.Semantics of the water quality data are not explicitly encoded in the data – machine can’t process data automatically. 3.Large amount of data due to large spatial region, long time span, and large number of pollutants and regulated limit – analysis can be time consuming and complex.

Semantics Can Help 1.Raw datasets can be represented in RDF and ontologies enable integration of data between sources 2.Ontologies also add meaning using OWL2 Datatype Restrictions and ObjectIntersectionOf 3.SPARQL CONSTRUCT and classification using Pellet allow automated reasoning over small subsets of data for efficiency.

SemantAqua Identifies point sources of water pollution, including water sites monitored by USGS and polluting facilities regulated by EPA. Demonstrates the effectiveness of semantic web technologies in addressing the challenges faced by environmental informatics systems. Enable/Empower citizens & scientists to better explore water related information. Generalized to the SemantEco framework for representing environmental and ecological data

SemantAQUA Workflow Archive CSV2RDF4LOD Enhance derive integrate archive Publish CSV2RDF4LOD Direct Visualiz e Reaso n

System Architecture access Virtuoso

Ontology Core SemantEco ontology o Extends existing best practice ontologies, e.g. SWEET, OWL-Time. o Includes terms for relevant pollution concepts o Can be use to conclude: “any water source that has a measurement outside of its allowable range” is a polluted water source. Portion of the SemantEco and SemantAqua ontologies.

Ontology Regulation Ontology o models the federal and state water quality regulations for drinking water sources o Can be use to define: for example, in California, “any measurement has value 0.01 mg/L is the limit for Arsenic” o Combined with the core ontology, we can infer “any water source contains 0.01 mg/L of Arsenic is a polluted water source.” Portion of Cal. Regulation Ontology.

Data Integration Adopt the data conversion and organization capabilities enabled by the TWC-LOGD portal o Linking to ontological terms: water:hasCharacteristic o Linking to external data: “water:Arsenic”, linked to “dbpedia:Arsenic” using rdfs:seeAlso To date, encoded USGS and EPA data from 27 states (still ongoing) o 58,302 facilities, 586,902 water sites o 31,823,304 measurements o 3,096,127,859 triples

Provenance Preserves provenance in the Proof Markup Language (PML). Data Source Level Provenance: o The captured provenance data are used to support provenance-based queries. Reasoning level provenance: o When water source been marked as polluted, user can access supporting provenance data for the explanations including the URLs of the source data, intermediate data and the converted data.

User Interface

5

Time Series Visualization Time series Visualization: o Presents data in time series visualization for user to explore and analyze the data Violation, measured value: 110 Violation, measured value: 8664 Violation, measured value: Violation, measured value: Limit value: 104

Link to Heath Domain

Data Reduction Segment data into state-wide per-agency graphs Integration, pagination solved via SPARQL 1.1 Construct query: CONSTRUCT { ?site a water:WaterMeasurementSite. … ?measurement a water:WaterMeasurement. … } WHERE { { SELECT ?site WHERE { GRAPH { ?site a water:WaterMeasurementSite. } } limit 10 } … }

Traditional SPARQL query SELECT DISTINCT ?site ?lat ?lng WHERE { ?site a pol:MeasurementSite ; pol:hasMeasurement ?m ; geo:lat ?lat ; geo:long ?lng. ?m pol:hasCharacteristic ?c ; pol:hasValue ?v ; units:hasUnit ?u. ?r a owl:Class ; rdfs:subClassOf [ owl:onProperty pol:hasCharacteristic ; owl:hasValue ?c ] ; rdfs:subClassOf [ owl:onProperty units:hasUnit ; owl:hasValue ?u ] ; rdfs:subClassOf [ owl:onProperty pol:hasValue ; owl:someValuesFrom [ ?p ?l ] ]. FILTER( isLiteral(?l) && ((?l < ?v && str(?p) = xsd:minExclusive) || …)) } With OWL reasoning SELECT DISTINCT ?site ?lat ?lng WHERE { ?site a pol:PollutedSite ; geo:lat ?lat ; geo:long ?lng. } Querying Simplification

Transparency and trust Provenance information encoded using semantic web technology supports transparency and trust. o SemantAqua provides detailed provenance information:  Original data, intermediate data, data source o “What if” Scenario: user may trust data from certain authorities only.  User can apply a stricter regulation from another state to a local water source.

Future Work Expand SemantAqua to support all 50 states. Add flood/weather information, and their effect on water sources; regulations can be different under flood conditions Support reasoning over contaminants and their corresponding health effects. Expand use of SemantEco ontology to other environmental topics: soil quality, air quality (e.g. support data from EPA’s CASTNET)

Questions?