Presentation on theme: "Towards a Common Provenance Model for Research Publications Linyun Fu Xiaogang Ma Patrick West Stace Beaulieu."— Presentation transcript:
Towards a Common Provenance Model for Research Publications Linyun Fu Xiaogang Ma Patrick West Stace Beaulieu Massimo Di Stefano Peter Fox Rensselaer Polytechnic Institute th St., Troy, NY 12180, United States Woods Hole Oceanographic Institute 86 Water St., Woods Hole, MA 02543, United States Results in research publications often are quite separated from the underlying collection and analysis of data. The grand goal of keeping track of provenance is to enable the readers to understand the process the authors have gone through to produce the reported results from the collected data. Provenance describes the lineage of the source data and the changing processes leading to the final results for readers to correctly interpret report content. Provenance also enables readers to evaluate the credibility of the reported results by digging into the software in use, source data and responsible agents. We are working towards using a provenance model to replicate the process from data transformation to reporting of results in research publications and even to validate the scientific conclusions by allowing readers to adapt existing experiments reported in the papers and carry out their own studies. General provenance ontologies such as PROV-O, the new W3C standard adopted in 2013 shown below, cannot record provenance detailed enough for repeating the described process in order to replicate the reported results. M OTIVATION We have been doing ontology specialization work based on PROV-O for two past projects, namely GCIS-IMSAP and ECOOP. GCIS-IMSAP models and captures provenance information for the recent National Climate Assessment (NCA) draft report of the US Global Change Research Program (USGCRP). A sample provenance sequence is shown below. ECOOP models and captures provenance information for the Ecosystem Status Report (ESR) of the Northeast Fisheries Science Center (NEFSC). A sample provenance sequence is shown below. Instances of classes “prov:Entity” and “prov:Activity” are shown in yellow and blue colors, respectively. For both projects, we directly used the "prov:Activity" class and its related properties in PROV-O to model the processes leading to data products. The provenance graphs turn out to be too general to execute. P AST E XPERIENCE We define our provenance ontology for research publications, called PROV-PUB-O, by specializing the "activity" class and the "used" property in PROV-O to make the ontology suitable for capturing executable provenance in research publications. Interesting activities in the process of preparing research papers are all the changes of data, which can be classified into the following three categories. Physical changes such as data download, copying, or sharing. Syntactical changes such as XML to JSON conversion. Semantic changes such as data analysis and transformation. Each of the above changes corresponds to a certain way of data usage. The specialized ontology is not only helpful in describing the provenance, but it also enables the construction of executable provenance graphs to preserve the data product preparing process at a level that is detailed enough to be replicable. A PPROACH Poster: IN31C-3739 Glossary: PROV-O – The W3C Provenance Ontology GCIS-IMSAP – Global Change Information System: Information Model and Semantic Application Prototypes ECOOP – An INTEROP proposal for the management in the Northeast and California Current Large Marine Ecosystems Sponsors: Acknowledgements: We acknowledge Jin Zheng from RPI, Justin Goldstein, Brian Duggan and Steve Aulenbach from USGCRP, and Curt Tilmes from NASA for their support in modeling and capturing provenance in the GCIS-IMSAP project.