Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management UKOLN is supported by: Models for integrating institutional repositories and research.
Advertisements

OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre
Doi> DOI – new applications panel IDF Annual Members meeting Bologna 2005.
Engaging repository policy with preservation Steve Hitchcock and Neil Jefferies* Preserv 2 Project School of Electronics and Computer Science (ECS), Southampton.
Engaging repository policy with preservation Steve Hitchcock and Neil Jefferies* Preserv 2 Project School of Electronics and Computer Science (ECS), Southampton.
Repository preservation services: divisible, viable and sustainable? Steve Hitchcock Preserv 2 Project Intelligence Agents Multimedia Group, School of.
1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
David De Roure Social Networking and Workflows in Research.
May 21, A Developers Viewpoint Prof Mark Baker School of Systems Engineering University of Reading Tel:
Semantic Web based Collaborative Knowledge Management LSL, ECS Feng (Barry) Tao A generic SOA for managing semantics driven domain knowledge.
Less is More Lightweight Ontologies and User Interfaces for Smart Labs J. G. Frey, G. V. Hughes, H. R. Mills, m. c. schraefel, G. M. Smith, David De Roure.
Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with School of ECS,
Applying preservation metadata to repositories For JISC KeepIt course on Digital Preservation Tools for Repository Managers Module 3, Primer on preservation.
Digital Preservation Tools for Repository Managers A practical course in five parts Concluding the course Module 5 University of Northampton, 30 March.
Prototype Knowledge Base: an on-line information service in dependability and security Hugh Glaser Electronics & Computer Science University of Southampton.
Supporting education and research Repositories in Context Digital repositories as components of an integrated infrastructure for education Leona Carpenter.
The PREMIS Data Dictionary Michael Day Digital Curation Centre UKOLN, University of Bath JORUM, JISC and DCC.
1 University of Namur, Belgium PReCISE Research Center Using context to improve data semantic mediation in web services composition Michaël Mrissa (spokesman)
Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit.
UKOLN is supported by: Bridget Robinson and Ann Chapman From analytical model to implementation and beyond CD Focus Schema Forum, CBI Conference Centre.
Q UERY L ANGUAGE C ONSTRUCTS FOR P ROVENANCE Murali Mani, Mohamad Alawa, Arunlal Kalyanasundaram University of Michigan, Flint Presented at IDEAS 2011.
Open Provenance Model Tutorial Session 6: Interoperability.
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
Open Provenance Model Tutorial Session 7: Open Provenance Model Vocabulary.
Provenance: concepts, architecture and envisioned tools Professor Luc Moreau University of Southampton
ICS-FORTH May 23, An Ontological Approach to Digital Preservation Metadata Martin Doerr Foundation for Research and Technology - Hellas Institute.
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Open Provenance Model Tutorial Session 3: OPM Serializations Luc Moreau University of Southampton.
UTPB: A Benchmark for Scientific Workflow Provenance Storage and Querying Systems Artem Chebotko Joint work with E. De Hoyos, C. Gomez, A. Kashlev, X.
A BRIEF INTRO TO THE PROV DATA MODEL Simon Miles The entire W3C Provenance Working Group.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
Using Provenance to Support Real-Time Collaborative Design of Workflows Workflow evolution provenance and OPM Tommy Ellkvist and Juliana Freire.
What legal inferences in OPM OPM Workshop Luc Moreau.
Provenance in Open Distributed Information Systems Syed Imran Jami PhD Candidate FAST-NU.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
Ragib Hasan Johns Hopkins University en Spring 2010 Lecture 7 03/29/2010 Security and Privacy in Cloud Computing.
You Cannot ReSIST Hugh Glaser Electronics & Computer Science University of Southampton DSSE, 28th February 2007.
Open Provenance Model Tutorial Session 5: OPM Emerging Profiles.
Rainbow Facilitating Restorative Functionality Within Distributed Autonomic Systems Philip Miseldine, Prof. Taleb-Bendiab Liverpool John Moores University.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Cornell CS Bibliographic Concepts CS 502 – Carl Lagoze – Cornell University Acks to H. Van de Sompel.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
“Old Style” Libraries, Digital Libraries: Convergences, Divergences, And the Troubles in Between.
The DSpace Course Module – An introduction to DSpace.
Usage of `provenance’: A Tower of Babel Luc Moreau.
Provenance-based Access Control in Cloud IaaS August 23, 2013 Dissertation Proposal Dang Nguyen Institute for Cyber Security University of Texas at San.
On Data Provenance in Group-centric Secure Collaboration Oct. 17, 2011 CollaborateCom Jaehong Park, Dang Nguyen and Ravi Sandhu Institute for Cyber Security.
Open Provenance Model Tutorial Session 4: Use cases from data.gov.uk.
The DART Project: building the new collaborative e- research infrastructure Presentation to 2006 AusWeb Conference.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
UKOLN is supported by: Digital Preservation Benefits Tools Project Dissemination Workshop Dr Liz Lyon, Associate Director, UK Digital Curation Centre Director,
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
Description of Information Resources: RDF/RDFS (an Introduction)
The Collaborative Semantic Grid David De Roure University of Southampton, UK
Significant Properties - where next?. 2 Curatorial role in SP Object analysis will enumerate technical properties and identify the purpose for each Stakeholder.
David De Roure Workflows in Support of Large-Scale Science Provenance, a.
1 Pegasus and wings WINGS/Pegasus Provenance Challenge Ewa Deelman Yolanda Gil Jihie Kim Gaurang Mehta Varun Ratnakar USC Information Sciences Institute.
Digital Preservation Tools for Repository Managers A practical course in five parts presented by the KeepIt project in association with Module 3, Primer.
Applying preservation metadata to repositories The British Library, 21 January 2008 Led by Steve Hitchcock With Bill Hubbard, Gareth Johnson.
DART: Drivers, Design, Dimensions, Demonstrators and Deliverables
Flexible Extensible Digital Object Repository Architecture
Introduction to Metadata
Flexible Extensible Digital Object Repository Architecture
VI-SEEM Data Repository
NSDL Data Repository (NDR)
Data Provenance.
Presentation transcript:

Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce truth-economy.html For JISC KeepIt course on Digital Preservation Tools for Repository Managers Module 3, Primer on preservation workflow, formats and characterisation Westminster-Kingsway College, London, 2 March 2010

Provenance: example The following excerpt and slides are taken with permission from Moreau, L. The Open Provenance Model: Towards inter-operability of Provenance Systems Example The provenance of a bottle of wine includes: Grapes from which it is made Where those grapes grew Process in the wines preparation How the wine was stored Between which parties the wine was transported, e.g. producer to distributer to retailer Where it was auctioned

Provenance Definition Oxford English Dictionary: – the fact of coming from some particular source or quarter; origin, derivation – the history or pedigree of a work of art, manuscript, rare book, etc.; – concretely, a record of the passage of an item through its various owners. The provenance of a piece of data is the process that led to that piece of data

The Science Lifecycle scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs,... Digital Libraries Next Generation Researchers Adapted from David De Rouresslides

scientists Local Web Repositories Graduate Students Undergraduate Students Virtual Learning Environment Technical Reports Reprints Peer- Reviewed Journal & Conference Papers Preprints & Metadata Certified Experimental Results & Analyses experimentation Data, Metadata, Provenance, Scripts, Workflows, Services, Ontologies, Blogs,... Digital Libraries Next Generation Researchers Finding the Provenance of research outputs across all the systems data transited through

Open Provenance Model (OPM) Allows us to express all the causes of an item Allow for process-oriented and dataflow oriented views Based on a notion of annotated causality graph Moreau, L., et al. v1.00 (Dec 2007), OPM v1.01 (Jul 2008), OPM v1.1 (Dec 2009)

OPM Requirements To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. To allow developers to build and share tools that operate on such provenance model. To define the model in a precise, technology- agnostic manner. To define bindings to XML/RDF separately To support a digital representation of provenance for any thing, whether produced by computer systems or not

OPM Serialisation OPM is an abstract data model to represent past execution and what causes data and processes to occur OPM can be serialised in different formats, referred to as technology bindings or serializations OPM XML schema ( OPM RDF schema OPM OWL ontology Effort underway to ensure full equivalence of representations

Nodes Artifact: Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system. Process: Action or series of actions performed on or caused by artifacts, and resulting in new artifacts. Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution. A P Ag

Edges A1 A2 P1 P2 wasTriggeredBy wasDerivedFrom A P used(R) AP wasGeneratedBy(R) AgP wasControlledBy(R) Edge labels are in the past to express that these are used to describe past executions

Illustration Process used artifacts and generated artifact Edge roles indicate the function of the artifact with respect to the process (akin to function parameters) Edges and nodes can be typed Causation chain: P was caused by A1 and A2 A3 and A4 were caused by P Does it mean that A3 and A4 were caused by A1 and A2? P A1 A2 A3 A4 used(divisor)used(dividend) wasGeneratedBy(rest)wasGeneratedBy(quotient) type=division

Time Constraints A P used(R) A wasGeneratedBy(R) Ag wasControlledBy(R) start: T2 end: T5 T4T3 T1<T3 (artifact must exist before being used) T2<T3 (process must have started before using artifacts) T3<T5 (process uses artifacts before it ends) T2<T4 (process must have started before generating artifacts) T4<T5 (process generates artifacts before it ends) T4<T6 (artifact must exist before being used) T2<T5 (process must have started before ending) no constraint between t3 and t4 wasGeneratedBy(R) T1 used(R) T6

Dublin Core Profile (draft) To many people, provenance is primarily about attribution, citation, bibliographic information DC provides terms to relate resources to such information DC profile aims to use of Dublin Core terms to OPM concepts and graph patterns with Simon Miles and Joe Futrelle

DC to OPM example: dc:publisher A2 A1 P publish wasSameResourceAs state=published Ag wasActionOf state=unpublished person name=Luc used wasGeneratedBy

What have we learned about provenance? Provenance: describes and records the results of processes on objects over time OPM represents provenance as XML OPM can be serialised in different formats RDF, Semantic Web OPM is a work in progress By working with an open standard model, that can pass information as XML and in standard serialisation formats (e.g. RDF), it should be possible to build provenance services into repository environments