Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.

Slides:



Advertisements
Similar presentations
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Advertisements

CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
Evolving the BCO-DMO search interface - experience with semantic and smart search Cyndy Chandler (WHOI) Peter Fox (RPI and WHOI) Robert Groman, Dicky Allison.
Presenting Provenance Based on User Roles Experiences with a Solar Physics Data Ingest System Patrick West, James Michaelis, Peter Fox, Stephan Zednik,
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Experiences Developing a User- centric Presentation of A Domain- enhanced Provenance Data Model Cynthia Chang 1, Stephan Zednik 1, Chris Lynnes 2, Peter.
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
OPeNDAP Developer’s Workshop Feb OPeNDAP 4 Data Server – Hyrax James Gallagher and Nathan Potter 21 Feb 2007.
References: [1] [2] [3] Acknowledgments:
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
1 Semantic Provenance and Integration Peter Fox and Deborah L. McGuinness Joint work with Stephan Zednick, Patrick West, Li Ding, Cynthia Chang, … Tetherless.
TWC Adoption of RDA DTR and PID in Deep Carbon Observatory Data Portal Stephan Zednik, Xiaogang Ma, John Erickson, Patrick West, Peter Fox, & DCO-Data.
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation.
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Introduction to the Semantic Web and Linked Data
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Facilitating Next Generation Science Collaboration: Marine Ecosystems Status Reports and Assessments June 24, 2014 IMBER – D2 Peter Fox (RPI/ Tetherless.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC A use case-driven iterative method for building a provenance-aware GCIS ontology Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Prizms for Data Publication and Management May 9, 2014 Katie Chastain.
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.
The Semantic eScience Framework AGU FM10 IN22A-02 Deborah McGuinness and Peter Fox (RPI) Tetherless World Constellation.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Linked Data Web that can be processed by machines
Get the poster at Semantic Visualization Provenance Records:
Provenance Capture in Data Access And Data Manipulation Software
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Stephan Zednik, Patrick West, Peter Fox Tetherless World Constellation
Modeling Data Set Versioning Operations
ToolMatch Service: Finding Tools for Your Data & Data for Your Tools ESIP Summer 2014 A Collaboration between ESIP’s: Semantic Web Cluster & Product &
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Modeling Data Set Versioning Operations
OPeNDAP/Hyrax Interfaces
Linked Data Ryan McAlister.
Presentation transcript:

Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless World Constellation

Motivation and Challenges Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess whether available data is fit for their usage. Was this dataset produced by a trustworthy source? Producers are often expected to justify their efforts in generating new datasets. Who is using our data? What are they using it for? And why? HOWEVER, most current-generation data analysis and manipulation tools fail to capture appropriate meta- information to address these needs. 1

Use Cases a PROV pingback-enabled community collaborates to categorize the points in a LiDAR scan of Disneyland. –A client accesses a data point from a LiDAR scan of Disneyland –The client categorizes the point as “water”, which is a new derivation of that point –The client pings-back about this new derivation A researcher generates a data product using OPeNDAP and uses it in a derivation. Another researcher, visualizing that derivation, wishes to access the provenance of the data product. What were the original data sources? Can they use them? A scientist wishes to discover any derivations of data sources they created. OPeNDAP servers are widely used, but are rarely recognized. 2

Semantic Web Iterative Development Methodology 3

W3C PROV-O 4

Provenance Trace 5 Running of the BES

Visualization 6

Linked Data Linked Data is about using the Web to connect related data that wasn't previously linked, or using the Web to lower the barriers to linking data currently linked using other methods. More specifically, Wikipedia defines Linked Data as "a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF." The four rules of linked data are: Use URIs as names for things (human readable) Use HTTP URIs so that people can look up those names When someone looks up a URI, provide useful information using standards (RDF*, SPARQL) Includes links to other URIs, so they can discover more things. 7

RDF 8 :BES_Plan rdf:type prov:Plan, prov:Collection; prov:qualifiedInfluence [ a prov:Influence; prov:entity opendap:NC_Module; prov:hadRole opendap:Read; opendap:order1; ]; prov:qualifiedInfluence [ a prov:Influence; prov:entity opendap:DAP_Module; prov:hadRole opendap:Constrain; opendap:order2; ]; prov:qualifiedInfluence [ a prov:Influence; prov:entity opendap:ASCII_Module; prov:hadRole opendap:Transmit; opendap:order3; ];. :CA_OrangeCo_2011_ nc.ascii rdf:type prov:Entity; prov:wasDerivedFrom :NC_File. prov:wasGeneratedBy :BES_Process;. :BES_Process rdf:type prov:Activity; prov:qualifiedAssociation [ a prov:Association; prov:agent :BES_Agent; prov:hadPlan :BES_Plan; rdfs:comment "Execution of BES ];. :BES_Agent rdf:type prov:Agent; foaf:name "BES Server".

The Response 9 C: GET S: 200 OK S: Link: rel=“ S: Link: rel=“ (CA_OrangeCo_2011_ ascii representation) Host: opendap.tw.rpi.eduClient: coyote.example.com

Pingback Upstream providers can discover derivations of their own products Downstream providers can discover the lineage of their data products 10

Pinging back 11 C: POST HTTP/1.1http://opendap.tw.rpi.edu/disney/pingback C: Content-Type: text/uri-list C: C: C: S: 204 No Content Host: opendap.tw.rpi.eduClient: coyote.example.com

Linking it Together We don’t just want to link data product to data product We need information about –Datasets (DCAT, new W3C working group on datasets) –People (FOAF) –Software and Software Versions (DOAP) –Organizations (FOAF) –Publications and Presentations (BIBO) –Visualizing data products (ToolMatch) 12

First attempt – after the fact First approach, collect information from generating the response and build the provenance Developed a Reporter, called after the response is transmitted, to generate the provenance and push to repository After-the-fact … don’t have all the information, the ordering Wrote out file to be ingested by the system, takes time, not available right away 13

Include Provenance Capture in BES Framwork In-time provenance collection – built into the BES framework Refactor parts of the BES to support the capture of provenance In addition to adding information to response header, might want to embed the provenance in the response object Make the provenance available immediately 14

What’s Next? Updates to select OPeNDAP modules to enable provenance logging during system executions. Refactor the BES to incorporate provenance capture during execution Live updating of RDF Knowledge Store to add provenance records during the OPeNDAP executions. 15

And we need your help! We are trying to build the list of contributors to the OPeNDAP software 16

Who’s Who? Participants James Michaelis, DataONE Summer Intern and RPI PhD Student, Developer Patrick West, RPI Principal Software Engineer Tim Lebo, RPI PhD Student, Developer 17 Acknowledgements James Gallagher, OPeNDAP Lead Developer Nathan Potter, OPeNDAP Developer Peter Fox, RPI Professor Deborah L. McGuinness, RPI Professor Stephan Zednik, RPI Senior Software Engineer

More Information Tetherless World GitHub Repository: – Tetherless World OPeNDAP Projects – W3C Prov – OPeNDAP – and In-Progress Development – 18

Thanks! 19

Glossary BIBO – Bibliographic Ontology DCAT – Dataset Catalog Ontology DOAP – Description of a Project Ontology FOAF – Friend of a Friend Ontology OPeNDAP – Opensource Project for a Network Data Access Protocol PROV-O – The W3C Provenance Ontology RPI/TWC – Rensselaer Polytechnic Institute / Tetherless World Constellation 20