Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1

Slides:



Advertisements
Similar presentations
Towards a Common Provenance Model for Research Publications Linyun Fu Xiaogang Ma Patrick West Stace Beaulieu.
Advertisements

® OGC Web Services Initiative, Phase 9 (OWS-9): Innovations Thread - OPeNDAP James Gallagher and Nathan Potter, OPeNDAP © 2012 Open Geospatial Consortium.
Complexity must become Linear or Decrease Smart data infrastructure: The sixth generation of mediation for data science Peter Fox 1
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Experiences Developing a User- centric Presentation of A Domain- enhanced Provenance Data Model Cynthia Chang 1, Stephan Zednik 1, Chris Lynnes 2, Peter.
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
References: [1] [2] [3] Acknowledgments:
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
A Semantically-Enabled Provenance- Aware Water Quality Portal Joint work with: Jin Guang Zheng, Ping Wang, Evan Patton, Timothy Lebo, Joanne Luciano Deborah.
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Tetherless World Constellation Open Government Data Jim Hendler Tetherless World Professor of Computer and Cognitive Science Assistant Dean of Information.
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Prof. Peter #twcrpi) Tetherless World Constellation Chair, Earth and Environmental Science/ Computer Science/ Cognitive.
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
Knowledge Networks and Science Data Ecosystems December 7, 2012, AGU12 IN54A-02. Peter Fox (RPI/ Tetherless World Constellation and WHOI/AOP&E)
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
A Semantic Web Approach for the Third Provenance Challenge Tetherless World Rensselaer Polytechnic Institute James Michaelis, Li Ding,
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Prizms for Data Publication and Management May 9, 2014 Katie Chastain.
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
Prizms for Data Publication and Management Katie Chastain May 9, 2014.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Open Government Data Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
TWC Adoption* of RDA DTR and PIT in the Deep Carbon Observatory Data Portal Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox, & the.
Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
Get the poster at Semantic Visualization Provenance Records:
Provenance Capture in Data Access And Data Manipulation Software
Xiaogang Ma, John Erickson, Patrick West, Stephan Zednik, Peter Fox,
Modeling Data Set Versioning Operations
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
Towards Executable Provenance Graphs for Reported Results in Research Publications Linyun Fu Xiaogang Ma Patrick West
Modeling Data Set Versioning Operations
OPeNDAP/Hyrax Interfaces
Presentation transcript:

Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1 James Michaelis 1 Tim Lebo 1 Deborah L. McGuinness 1 Peter Fox 1 ( 1 Rensselaer Polytechnic Institute th St., Troy, NY, United Poster: IN31C-3738 Glossary: OPeNDAP - Open-source Project for a Network Data Access Protocol Provenance – information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness RPI – Rensselaer Polytechnic Institute TWC – Tetherless World Constellation at Rensselaer Polytechnic Institute Sponsors: Tetherless World Constellation Providing proper citation and attribution for published data, derived data products, and the software tools used to generate them, has always been an important aspect of scientific research. However, it is often the case that this type of detailed citation and attribution is lacking. This is in part because it often requires manual markup since dynamic generation of this type of provenance information is not typically done by the tools used to access, manipulate, transform and visualize data. In addition, the tools themselves lack the information needed to be properly cited themselves. The OPeNDAP Hyrax Software Framework is a tool that provides access to and the ability to constrain, manipulate and transform different types of data from different data formats into a common format, the DAP (Data Access Protocol), in order to derive new data products. A user, or another software client, specifies an HTTP URL in order to access a particular piece of data, and appropriately transform it to suit a specific purpose of use. The resulting data products, however, do not contain any information about what data was used to create it, or the software process used to generate it, let alone information that would allow the proper citation and attribution to down stream researchers and tool developers. We will present our approach to provenance capture in Hyrax including a mechanism that can be used to report back to the hosting site any derived products, such as publications and reports, using the W3C PROV recommendation pingback service. We will demonstrate our utilization of Semantic Web and Web standards, the development of an information model that extends the PROV model for provenance capture, and the development of the pingback service. We will present our findings, as well as our practices for providing provenance information, visualization of the provenance information, and the development of pingback services, to better enable scientists and tool developers to be recognized and properly cited for their contributions. Host: opendap.tw.rpi.eduClient: coyote.example.com C: GET S: 200 OK S: Link: rel=“ S: Link: rel=“ (CA_OrangeCo_2011_ ascii representation) Host: opendap.tw.rpi.eduClient: coyote.example.com C: POST HTTP/1.1http://opendap.tw.rpi.edu/disney/pingback C: Content-Type: text/uri-list C: C: C: S: 204 No Content Abstract Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess whether available data is fit for their usage. Was this dataset produced by a trustworthy source? Producers are often expected to justify their efforts in generating new datasets. Who is using our data? What are they using it for? And why? HOWEVER, most current-generation data analysis and manipulation tools fail to capture appropriate meta-information to address these needs. Motivations and Challenges a PROV pingback-enabled community collaborates to categorize the points in a LiDAR scan of Disneyland. A client accesses a data point from a LiDAR scan of Disneyland The client categorizes the point as “water”, which is a new derivation of that point The client pings-back about this new derivation A researcher generates a data product using OPeNDAP and uses it in a derivation. Another researcher, visualizing that derivation, wishes to access the provenance of the data product. What were the original data sources? Can they use them? A scientist wishes to discover any derivations of data sources they created. OPeNDAP.org wishes to be recognized for providing access to, manipulation of, and transformation of derived data products. Use CasesW3C PROV Recommendation Simple Concepts and properties representing the concepts of the PROV Model Representing OPeNDAP provenance trace for use case 1 using PROV Model. OPeNDAP Back-End Server Design :BES_Plan rdf:type prov:Plan, prov:Collection; prov:qualifiedInfluence [ a prov:Influence; prov:entity opendap:NC_Module; prov:hadRole opendap:Read; opendap:order1; ]; prov:qualifiedInfluence [ a prov:Influence; prov:entity opendap:DAP_Module; prov:hadRole opendap:Constrain; opendap:order2; ]; prov:qualifiedInfluence [ a prov:Influence; prov:entity opendap:ASCII_Module; prov:hadRole opendap:Transmit; opendap:order3; ];. :CA_OrangeCo_2011_ nc.ascii rdf:type prov:Entity; prov:wasDerivedFrom :NC_File. prov:wasGeneratedBy :BES_Process;. :BES_Process rdf:type prov:Activity; prov:qualifiedAssociation [ a prov:Association; prov:agent :BES_Agent; prov:hadPlan :BES_Plan; rdfs:comment "Execution of BES ];. :BES_Agent rdf:type prov:Agent; foaf:name "BES Server". RDF Representation of provenance collected in first use case Initial request for dataPingback of derived dataMajor goal to provide visualizations of provenance All the way to seeing who implemented the code that was used to act on the data. Be able to see the software modules that acted on the data in producing the derived data. Initial visualization is of the provenance trace allowing users to get back to the original data, actions taken on the data, and the agents that performed those actions. Take Away With this implementation of provenance capture in OPeNDAP Hyrax we provide data providers and data users the provenance to support their work Providing ping-back service makes it much easier to provide citation and credit to those who deserve it Lots of triples are created, but easily cleaned up by determining what data products have been pinged We can implement provenance capture without modifying existing modules Visualization of provenance trace, agents who act on data products, information about software and software developers becomes easy with linked data principles. Acknowledgements: OPeNDAP.org for their support and being open source, especially James Gallagher and Nathan Potter. First attempt was to capture provenance after-the-fact. Didn’t have enough information in the end. Second attempt is to collect the information on-the-go but not forcing module developers to have to implement anything, but providing hooks to allow them to.