Provenance Capture in Data Access And Data Manipulation Software

Slides:



Advertisements
Similar presentations
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
Advertisements

Evolving the BCO-DMO search interface - experience with semantic and smart search Cyndy Chandler (WHOI) Peter Fox (RPI and WHOI) Robert Groman, Dicky Allison.
McGuinness – Microsoft eScience – December 8, Semantically-Enabled Science Informatics: With Supporting Knowledge Provenance and Evolution Infrastructure.
A Semantic Sommelier as an Ontology-powered Mobile Social Application and a Pedagogical Tool Deborah L. McGuinness and Evan W. Patton.
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Semantic Representation of Temporal Metadata in a Virtual Observatory Han Wang 1 Eric Rozell 1
Applying Semantics in Dataset Summarization for Solar Data Ingest Pipelines James Michaelis ( ), Deborah L. McGuinness
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
ToolMatch: Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Patrick West 1 Nancy Hoebelheinrich.
Key integrating concepts Groups Formal Community Groups Ad-hoc special purpose/ interest groups Fine-grained access control and membership Linked All content.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
WIKI IN EDUCATION Giti Javidi. W HAT IS WIKI ? A Wiki can be thought of as a combination of a Web site and a Word document. At its simplest, it can be.
Provenance-Aware Faceted Search Deborah L. McGuinness 1,2 Peter Fox 1 Cynthia Chang 1 Li Ding 1.
Beyond a Data Portal: A Collaborative Environment for the Deep Carbon Science Communities Han Wang, Yu Chen, Patrick West, John Erickson, Xiaogang Ma,
Metadata Guides for Smarties Marine Metadata Initiative URL:
Configurable User Interface Framework for Cross-Disciplinary and Citizen Science Presented by: Peter Fox Authors: Eric Rozell, Han Wang, Patrick West,
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
SemantAqua: A Semantically-Enabled Provenance-Aware Water Quality Portal Evan W. Patton, Ping Wang, Jin Guang Zheng, Timothy Lebo, Li Ding, Joanne Luciano,
Provenance Capture in Data Access And Data Manipulation Software Patrick West 1 Peter Fox
An Example in The DCO Data Portal Formal Specification of Data Types in the Deep Carbon Observatory Data Portal Xiaogang (Marshall) Ma
References: [1] [2] [3] Acknowledgments:
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
Supported by EU projects 12/12/2013 Athens, Greece Open Data in Agriculture Hands-on with data infrastructures that can power your agricultural data products.
Discovering accessibility, display, and manipulation of data in a data portal Nancy Hoebelheinrich Patrick West 2
Motivations and Challenges: Proper data management hinges on recording and maintaining “steps” applied to create data. Consumers require methods to assess.
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
Local global disambiguation of terms and concepts The BCO-DMO metadata database uses controlled vocabularies to record many of the important pieces of.
Modeling and Representing National Climate Assessment Information using Linked Data Jin Guang Zheng 1 Curt Tilmes 2
NEON non-specialist use case; Science data reuse in a classroom Peter Fox Brian Wee Patrick West 1
DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information,
Citation and Recognition of contributions using Semantic Provenance Knowledge Captured in the OPeNDAP Software Framework Patrick West 1
Applying Provenance Extensions to OPeNDAP Framework Patrick West, James Michaelis, Tim Lebo, Deborah L. McGuinness Rensselaer Polytechnic Institute Tetherless.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West 1 Nancy Hoebelheinrich.
Resource Discovery for Extreme Scale Collaboration Benno Lee Patrick West 1 William Smith 2
Coding Provenance in Software and Matching Tools to Data OPeNDAP Provenance Project And ESIP ToolMatch Project Patrick West, Tetherless World Constellation.
TWC-SWQP: A Semantically-Enabled Provenance-Aware Water Quality Portal Ping Wang, Jin Guang Zheng, Linyun Fu, Evan W. Patton, Timothy Lebo, Li Ding, Joanne.
DCO-VIVO: A Collaborative Data Platform for the Deep Carbon Science Communities Han Wang 1 ( ), Yu Chen 1 Patrick West.
VIVO Conference 2013 Panel on VIVO Use-Cases for Collaborative Science: From Researcher Networks to Semantic User Interfaces for Data Patrick West – Tetherless.
References: [1] Lebo, T., Sahoo, S., McGuinness, D. L. (eds.), PROV-O: The PROV Ontology. Available via: [2]
Information Modeling and Semantic Web Application For National Climate Assessment Jin Guang Zheng 1 Curt Tilmes 2
Deepcarbon.net Xiaogang Ma, Patrick West, John Erickson, Stephan Zednik, Yu Chen, Han Wang, Hao Zhong, Peter Fox Tetherless World Constellation Rensselaer.
Semantic Similarity Computation and Concept Mapping in Earth and Environmental Science Jin Guang Zheng Xiaogang Ma Stephan.
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
 Key integrating concepts  Groups  Formal Community Groups  Ad-hoc special purpose/ interest groups  Fine-grained access control and membership 
Determining Fitness-For-Use of Ontologies through Change Management, Versioning and Publication Best Practices Patrick West 1 Stephan.
TWC A use case-driven iterative method for building a provenance-aware GCIS ontology Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c, Linyun Fu.
Supported by ESIP Semantic Web Cluster A service based on community-built semantic web applications Provide users with the means to match their datasets.
Catalog/ ID Selected Logical Constraints (disjointness, inverse, …) Terms/ glossary Thesauri “narrower term” relation Formal is-a Frames (properties) Informal.
How Environmental Informatics is Preparing Us for the Era of Big Data AGU FM 2013 GC11F-01 December 09, 2013, MW 3001 Peter
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Presenting Semantic Data Through “Instance Hubs” Using Authoritative URI Design Schemes Alexei Bulazel 1 ( ), Dominic Difranzo 1 (
Social and Personal Factors in Semantic Infusion Projects Patrick West 1 Peter Fox 1 Deborah McGuinness 1,2
Annotating and Embedding Provenance in Science Data Repositories to Enable Next Generation Science Applications Deborah L. McGuinness.
Poster: EGU Glossary: USGCRP – United States Global Change Research Program NCA – National Climate Assessment GCIS – Global Change Information.
Scaling the Wall: Experiences adapting a Semantic Web application to utilize social networks on mobile devices Evan W. Patton 1 ( ) &
British Oceanographic Data Centre, Liverpool, England, United Kingdom
Get the poster at Semantic Visualization Provenance Records:
improve the efficiency, collaborative potential, and
Jun Zhao University of Oxford
Deep Carbon Observatory Data Science Platform
Modeling Data Set Versioning Operations
ToolMatch Service: Finding Tools for Your Data & Data for Your Tools ESIP Summer 2014 A Collaboration between ESIP’s: Semantic Web Cluster & Product &
Ecosystem Status Report: collaborating with IPython Notebooks
ToolMatch Discovering What Tools can be used to Access, Manipulate, Transform, and Visualize Data Products Patrick West1 Nancy
Adoption of RDA DTR and PIT in the Deep Carbon Observatory Data Portal
Towards Executable Provenance Graphs for Reported Results in Research Publications Linyun Fu Xiaogang Ma Patrick West
Modeling Data Set Versioning Operations
Future Development Plans
Presentation transcript:

Provenance Capture in Data Access And Data Manipulation Software ESIP Winter 2014 W3C PROV-O Recommendation DOAP – Description of a Project Ontology DOAP provides us with the ability to represent software, software projects, releases of software, licensing information, where to download releases, where to find information about the software, and so on. Provenance Capture in Data Access And Data Manipulation Software Patrick West1 (westp@rpi.edu), Peter Fox1(pfox@cs.rpi.edu), Deborah L. McGuinness1 (dlm@cs.rpi.edu), James Gallagher2(jgallagher@opendap.org), Dan Holloway2(d.holloway@opendap.org), Nathan Potter2(ndp@opendap.org) (1Tetherless World Constellation, Rensselaer Polytechnic Institute Troy, NY, United States) (2OPeNDAP.org, Saunderstown, RI, United States) Abstract Need: to trace back the origins of data products, whether images or charts in a report, data obtained from a sensor on an instrument, a generated dataset referenced in a research paper, in government reports on the environment, or in a publication or poster presentation. Yet: most software applications that perform data access and manipulation keep only limited history of the data, i.e. the provenance. Scenario: There is a figure in a report showing multiple graphs and plots related to global climate, the report is being drafted for a government agency. The graphs and plots are generated using an algorithm from an iPython Notebook, where the algorithm pulls data from four data sets from that portal. That data is aggregated together over the time dimension, constrained to a few parameters, accessed using a particular piece of data access software, and converted from one datatype to another datatype (1) Today we’re lucky to get a blob of text under the figure that might say a couple things about the figure with a reference to a publication that was written a few years ago. Thus: Data citation, data publishing information, licensing information, and provenance are all lacking in the derived data products. What we really want: Trace the figure all the way back to the original datasets, have access to data integration and manipulation processes, See information about the researchers, the project, the agency funding, the award, and the organizations collaborating on the project. Example DOAP Document OPeNDAP Hyrax Processing :dds_of_reading a prov:Entity; dcterms:format opendap:DataDDS; prov:wasGeneratedBy [ a prov:Activity; prov:used <http://test.opendap.org/dap/data/h5/monday.h5> [ a vsto:Dataset, prov:Entity, toolmatch:DataCollection; toolmatch:hasAccessURL <http://test.opendap.org/dap/data/h5/monday.h5>; ]; prov:used <http://test.opendap.org/dap/data/h5/tuesday.h5> [ prov:wasAssociatedWith <opendapi:software/hdf5_handler/2.1.1>; . :constrained_dds a prov:Entity; prov:used :dds_of_reading; prov:wasAssociatedWith <opendapi:software/BES/3.12.0>; :aggregated_dds a prov:Entity; prov:used :constrained_dds; prov:wasAssociatedWith <opendapi:software/ncml_module/1.2.2>; :result a foaf:Document; nfo:fileName "thursday.h5"; dcterms:format netcdf; prov:used :aggregated_dds; prov:wasAssociatedWith <opendapi:software/fileout_netcdf/1.2.1>; <http://opendap.tw.rpi.edu/instances/software/BES> a doap:Project, prov:Entity; doap:name "OPeNDAP Back-End Server (BES)"; doap:developer <http://tw.rpi.edu/instances/PatrickWest>; doap:developer <http://tw.rpi.edu/instances/DanHalloway>; doap:developer <http://tw.rpi.edu/instances/James_Gallagher>; doap:developer <http://tw.rpi.edu/instances/NathanPotter>; doap:homepage <http://opendap.org/download/hyrax?q=BES_software>; doap:vendor <http://tw.rpi.edu/instances/OPeNDAP>; doap:repository <http://opendap.tw.rpi.edu/instances/Repository>; doap:bug-database <http://scm.opendap.org/trac/>; doap:release <http://opendap.tw.rpi.edu/instances/software/BES/3.12.0>; doap:description "BES is a high-performance back-end server software framework that allows data providers more flexibility in providing end users views of their data."; doap:license <http://opendap.tw.rpi.edu/instances/License>; . <http://opendap.tw.rpi.edu/instances/software/BES/3.12.0> a doap:Version, prov:Entity; prov:specializationOf <http://opendap.tw.rpi.edu/instances/software/BES>; doap:name "BES-3.12.0"; doap:revision "3.12.0"; doap:download-page <http://opendap.org/download/hyrax/1.9>; doap:repository <http://scm.opendap.org/svn/tags/bes/3.12.0>; doap:created 2013-08-27; <http://opendap.tw.rpi.edu/instances/Repository> a doap:SVNRepository; doap:location <http://scm.opendap.org/svn/> doap:browse <http://scm.opendap.org/svn/> <http://opendap.tw.rpi.edu/instances/License> dc:description "This software is distributed under the GNU Lesser General Public License <http://www.gnu.org/licenses/gpl.html>"; doap:name "GNU LESSER GENERAL PUBLIC LICENSE"; rdfs:seeAlso <http://www.gnu.org/licenses/gpl.html>; <http://opendap.tw.rpi.edu/id/opendap/D9IH6677D3I6HDIHD36IHDI7DH> # The hash above is: HASH(config file, BES version that read it) a prov:Agent; prov:wasDerivedFrom <http://opendap.tw.rpi.edu/instances/software/hdf5_handler/2.1.1>, <http://opendap.tw.rpi.edu/instances/software/BES/3.12.0>, <http://opendap.tw.rpi.edu/instances/software/ncml_module/1.2.2/>, <http://opendap.tw.rpi.edu/instances/software/fileout_netcdf/1.2.1>; prov:wasDerivedFrom :config_file_hash; # b/c BES set it up: prov:wasAttributedTo <http://scm.opendap.org/svn/tags/bes/3.9.2>; Link to Science Domain We show here how we can link to a specific science domain Ontologies, such as VSTO, and other tools, such as ToolMatch Scientific Data Requests processing Provenance Store Gets back response Provenance included in header Ping-back service in header Input Requested We’d love your feedback and your input. Specifically, your thoughts on: The model. Using PROV-O, DOAP The use of provenance and ping-back in the response header Would you like to review the model/design/technology infrastructure Would you like to collaborate on this project? Any other questions, comments, suggestions are welcome! How to provide feedback: Scan the QR code to the left. This will take you to a page with more information and information on how you can contribute. Or email westp@rpi.edu with your questions, comments, suggestions. Provides ping-back information 200 OK S: Link: <http://opendap.tw.rpi.edu/instances/result/provenance>; rel="http://www.w3.org/ns/prov#has_provenance" S: Link: <http://opendap.tw.rpi.edu/instances/result/pingback>; rel="http://www.w3.org/ns/prov#pingback" OPeNDAP Hyrax Response Poster QRCode Current Proposal Where it came from The request that was made of the data access server. The source data set(s) Any data set(s) that the resulting data product is derived from Processing that happened Any software that did any sort of processing against the data and any intermediate data The software version This will vary for different servers (it might be one number or a list or numbers), but in the end it is one or more of the tried and true x.y.z version numbers along with names. How to cite the dataset in a publication Nominally a DOI or instructions. Licensing information/restrictions URL reference to a license. Future Work Modeling In this presentation we’ve given a glimpse of the information modeling and ontology development we feel is necessary to complete this project. Need to finalize and review the model. Coding There will be coding to be done in the server software, in this case OPeNDAP BES. Creating the provenance trace, ability to add license and citation information, and creating the ping-back services. Providing Services In addition to provenance trace access, provide a prov:pingback service so that users can tell the data providers what they’ve done with the data (citation in a report or publication, for example) Glossary: DAP – OPeNDAP Data Access Protocol DOAP – Description of a Project (https://github.com/edumbill/doap/wiki) FOAF - Friend of a Friend (http://www.foaf-project.org/) OWL – Web Ontology Language (http://www.w3.org/2002/07/owl#) OPeNDAP - Open-source Project for a Network Data Access Protocol (http://opendap.org) PROV-O – Provenance Ontology, W3C recommendation (http://www.w3.org/TR/prov-o/) RDFs – Resource Description Framework Schema (http://www.w3.org/2000/01/rdf-schema#) RPI/TWC – Rensselaer Polytechnic Institute / Tetherless World Constellation (http://tw.rpi.edu) Acknowledgments: Stephan Zednik and Tim Lebo – Tetherless World Constellation