Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

1 ICS-FORTH EU-NSF Semantic Web Workshop 3-5 Oct Christophides Vassilis Database Technology for the Semantic Web Vassilis Christophides Dimitris Plexousakis.
Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce
Feedback on OPM Yogesh Simmhan Microsoft Research Synthesis of pairwise conversations with: Roger Barga Satya Sahoo Microsoft Research Beth Plale Abhijit.
XML: Extensible Markup Language
Q UERY L ANGUAGE C ONSTRUCTS FOR P ROVENANCE Murali Mani, Mohamad Alawa, Arunlal Kalyanasundaram University of Michigan, Flint Presented at IDEAS 2011.
Open Provenance Model Tutorial Session 7: Open Provenance Model Vocabulary.
ICS-FORTH May 23, An Ontological Approach to Digital Preservation Metadata Martin Doerr Foundation for Research and Technology - Hellas Institute.
Open Provenance Model Tutorial Session 3: OPM Serializations Luc Moreau University of Southampton.
A Provenance-based Access Control Model for Dynamic Separation of Duties July 10, 2013 PST 2013 Dang Nguyen, Jaehong Park, and Ravi Sandhu Institute for.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Using Provenance to Support Real-Time Collaborative Design of Workflows Workflow evolution provenance and OPM Tommy Ellkvist and Juliana Freire.
What legal inferences in OPM OPM Workshop Luc Moreau.
Validating Streaming XML Documents Luc Segoufin & Victor Vianu Presented by Harel Paz.
Semantic Rich Internet Application (RIA) Modeling, Deployment and Integration Zoran Balkić, Marina Pešut, Franjo Jović Faculty of Electrical Engineering,
Introduction to Databases Transparencies
Sharif University of Technology Session # 7.  Contents  Systems Analysis and Design  Planning the approach  Asking questions and collecting data 
OIL: An Ontology Infrastructure for the Semantic Web D. Fensel, F. van Harmelen, I. Horrocks, D. L. McGuinness, P. F. Patel-Schneider Presenter: Cristina.
Computational Thinking Related Efforts. CS Principles – Big Ideas  Computing is a creative human activity that engenders innovation and promotes exploration.
Open Provenance Model Tutorial Session 5: OPM Emerging Profiles.
Predicting Missing Provenance Using Semantic Associations in Reservoir Engineering Jing Zhao University of Southern California Sep 19 th,
Knowledge Mediation in the WWW based on Labelled DAGs with Attached Constraints Jutta Eusterbrock WebTechnology GmbH.
Amarnath Gupta Univ. of California San Diego. An Abstract Question There is no concrete answer …but …
Future of MDR - ISO/IEC Metadata Registries (MDR) Larry Fitzwater, SC 32 WG 2 Convener Computer Scientist U.S. Environmental Protection Agency May.
On Roles of Models in Information Systems (Arne Sølvberg) Gustavo Carvalho 26 de Agosto de 2010.
1 CIM User Group Conference Call december 8th 2005 Using UN/CEFACT Core Component methodology for EIC/TC 57 works and CIM Jean-Luc SANSON Electrical Network.
The Design Discipline.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
EbXML Technical Architecture From: ebXML Technical Architecture Specification v1.04,
The Global Learning Resource Connection Supporting the Next Generation of Education The Achievement Standards Network (ASN) A JES & Co. Program Diny Golder.
Introduction to MDA (Model Driven Architecture) CYT.
Usage of `provenance’: A Tower of Babel Luc Moreau.
Provenance-based Access Control in Cloud IaaS August 23, 2013 Dissertation Proposal Dang Nguyen Institute for Cyber Security University of Texas at San.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
On Data Provenance in Group-centric Secure Collaboration Oct. 17, 2011 CollaborateCom Jaehong Park, Dang Nguyen and Ravi Sandhu Institute for Cyber Security.
Guide to State Transition Diagram. 2 Contents  What is state transition diagram?  When is state transition diagram used?  What are state transition.
Knowledge Modeling, use of information sources in the study of domains and inter-domain relationships - A Learning Paradigm by Sanjeev Thacker.
Next-generation databases Active databases: when a particular event occurs and given conditions are satisfied then some actions are executed. An active.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Sommerville 2004,Mejia-Alvarez 2009Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
EbXML Technical Architecture From: ebXML Technical Architecture Specification v1.04,
An Ontological Framework for Web Service Processes By Claus Pahl and Ronan Barrett.
UML Class Diagram Trisha Cummings. What we will be covering What is a Class Diagram? Essential Elements of a UML Class Diagram UML Packages Logical Distribution.
1 Capturing Requirements As Use Cases To be discussed –Artifacts created in the requirements workflow –Workers participating in the requirements workflow.
1 Capturing Requirements As Use Cases To be discussed –Artifacts created in the requirements workflow –Workers participating in the requirements workflow.
A Systemic Approach for Effective Semantic Access to Cultural Content Ilianna Kollia, Vassilis Tzouvaras, Nasos Drosopoulos and George Stamou Presenter:
WP3: Provenance and Access Policies Giorgos Flouris (FORTH) - Irini Fundulaki (CWI & FORTH) -
Semantic Technologies and Application to Climate Data M. Benno Blumenthal IRI/Columbia University CDW /04-01.
MFI-5: Metamodel for process model registration WANG Chong, HE Keqing, HE Yangfan, WANG Jian State Key Lab of Software Engineering (SKLSE) Wuhan University,
Architectural Design of a Multi- Agent System for handling Metadata streams Don Cruickshank, Luc Moreau, David De Roure Department of Electronics and Computer.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Description of Information Resources: RDF/RDFS (an Introduction)
Formal Specification: a Roadmap Axel van Lamsweerde published on ICSE (International Conference on Software Engineering) Jing Ai 10/28/2003.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
Understanding Naturally Conveyed Explanations of Device Behavior Michael Oltmans and Randall Davis MIT Artificial Intelligence Lab.
Ontologies Reasoning Components Agents Simulations An Overview of Model-Driven Engineering and Architecture Jacques Robin.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
UNIT-IV Designing Classes – Access Layer ‐ Object Storage ‐ Object Interoperability.
Talk Outline Motivation and Background. Policy Contexts.
Chapter 8: Concurrency Control on Relational Databases
 DATAABSTRACTION  INSTANCES& SCHEMAS  DATA MODELS.
Chapter 4 Entity Relationship (ER) Modeling
Chapter 20 Object-Oriented Analysis and Design
NSDL Data Repository (NDR)
2. An overview of SDMX (What is SDMX? Part I)
BPMN - Business Process Modeling Notations
Data Model.
MFI-5: Metamodel for process model registration
Presentation transcript:

Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton

Session 2: Aims In this session, you will learn about: The Open Provenance Model The definition of its abstract model The inferences it supports Various efforts to provide OPM with a semantics

Session 2: Contents Requirements and non-requirements Definition of OPM Specialization of OPM with Profiles Formalizations of OPM

OPM (NON-)REQUIREMENTS

OPM Requirements To allow provenance information to be exchanged between systems, by means of a compatibility layer based on a shared provenance model. To allow developers to build and share tools that operate on such provenance model. To define the model in a precise, technology- agnostic manner. To define bindings to XML/RDF separately To support a digital representation of provenance for any “thing”, whether produced by computer systems or not

OPM Non-Requirements OPM does not specify the internal representations that systems have to adopt to store and manipulate provenance internally. OPM does not specify protocols to store such provenance information in provenance repositories. OPM does not specify protocols to query provenance repositories.

OPM Layered Model OPM Core OPM Essential Profiles: Collections, Attribution OPM Domain Specialization: Workflow, Web Technology Bindings: XML, RDF OPM Sig OPM based APIs: record, query 7

THE OPEN PROVENANCE MODEL (OPM)

Open Provenance Model Allow us to express all the causes of an item – e.g., provenance of a bottle of wine includes: Grapes from which it is made Where those grapes grew Process in the wine’s preparation How the wine was stored Between which parties the wine was transported, e.g. producer to distributer to retailer Where it was auctioned Allow for process-oriented and dataflow oriented views Based on a notion of annotated causality graph

Nodes Artifact: Immutable piece of state, which may have a physical embodiment in a physical object, or a digital representation in a computer system. Process: Action or series of actions performed on or caused by artifacts, and resulting in new artifacts. Agent: Contextual entity acting as a catalyst of a process, enabling, facilitating, controlling, affecting its execution. A P Ag

Edges A1 A2 P1 P2 wasTriggeredBy wasDerivedFrom A P used(R) AP wasGeneratedBy(R) AgP wasControlledBy(R) Edge labels are in the past to express that these are used to describe past executions

Illustration Process “used” artifacts and “generated” artifact Edge “roles” indicate the function of the artifact with respect to the process (akin to function parameters) Edges and nodes can be typed Causation chain: P was caused by A1 and A2 A3 and A4 were caused by P Does it mean that A3 and A4 were caused by A1 and A2? P A1 A2 A3 A4 used(divisor)used(dividend) wasGeneratedBy(rest)wasGeneratedBy(quotient) type=division

Hierarchical Descriptions (1) P A1 A2 A3 A4 used(r2)used(r1) wasGeneratedBy(r3)wasGeneratedBy(r4)

Hierarchical Descriptions (2) P1 A1 A2 A3 A4 used(r2)used(r1) wasGeneratedBy(r3)wasGeneratedBy(r4) P2 Drill down

Hierarchical Descriptions (3) P A1 A2 A3 A4 used(r2)used(r1) wasGeneratedBy(r3)wasGeneratedBy(r4) P1 A1 A2 A3 A4 used(r2)used(r1) wasGeneratedBy(r3)wasGeneratedBy(r4) P2 If these two graphs denote the same execution, it is not true that A4 was caused by A1; hence dependencies between artifacts need to be asserted explicit

Explicit Data Derivations (1) P A1 A2 A3 A4 used(r2)used(r1) wasGeneratedBy(r3)wasGeneratedBy(r4) P1 A1 A2 A3 A4 used(r2)used(r1) wasGeneratedBy(r3)wasGeneratedBy(r4) P2 If these two graphs denote the same execution, it is not true that A4 was cause by A1; hence dependencies between artifacts need to be asserted explicit wasDerivedFrom

Explicit Data Derivations (2) Causation chain: P was caused by A1 and A2 A3 and A4 were caused by P A3 was caused by A1 and A2 A4 was caused by A1 and A2 P A1 A2 A3 A4 used(divisor)used(dividend) wasGeneratedBy(rest)wasGeneratedBy(quotient) type =division wasDerivedFrom

Provenance of Physical Objects

Another Account of a same Execution

Accounts Mechanism by which multiple descriptions of a same execution can co-exist in a same OPM graph Different accounts may be provided by different observers (or asserters) Accounts can overlap if they have some OPM subgraph in common An account can be a refinement of another, if it provides more details – Support for hierarchical descriptions Accounts may be conflicting!

Accounts Account is like a graph colouring Nodes/edges are asserted to belong to some accounts Bake execution Bad Bake execution Both executions

OPM SEMANTICS

Completion Rules P1 P2 P1 P2 A Equivalence A1 A2 A1 A2 P Converse does not necessarily hold

Inferences Transitivity of edges connecting an artifact Starred edge “was Caused by” What we can infer is defined by transitive closure A A A/P1 A/P2 A A A A/P1 A/P2 *

WasTriggeredBy is not transitive By completion, there exists A12 generated by P1 and used by P2 By completion, there exists A23 generated by P2 and used by P3 A23 could have been generated before A12 was used P1 P3 P1 P3 * P2

OPM Inferences

Valid OPM Graphs WasDerivedFrom* is acyclic within one account – Intuition: a data item cannot be derived from itself – Note: cycles may exist in multiple accounts An artifact can be generated by at most one process in a given account

Time Information Causality implies time ordering, but not the converse Time regarded as crucial information in the provenance of data (though time does not imply causality) The model specifies constraints that time information must satisfy with respect to causal dependencies

Time Constraints A P used(R) A wasGeneratedBy(R) Ag wasControlledBy(R) start: T2 end: T5 T4T3 T1<T3 (artifact must exist before being used) T2<T3 (process must have started before using artifacts) T3<T5 (process uses artifacts before it ends) T2<T4 (process must have started before generating artifacts) T4<T5 (process generates artifacts before it ends) T4<T6 (artifact must exist before being used) T2<T5 (process must have started before ending) no constraint between t3 and t4 wasGeneratedBy(R) T1 used(R) T6

Annotations All OPM entities (edges, nodes, graphs, accounts can be annotated) All annotations should be addressable (allowing for annotations of annotations) Bindings to formalize how annotations can be serialized (standard in RDF, custom in XML) Reserved properties: hasType, hasValue,... Let’s no reinvent the wheel!

OPM SPECIALIZATIONS

Concept of a Profile A specialisation of an OPM graph for a specific domain or to handle a specific problem Profile definitions are welcome! Note: profile multiplicity challenges inter- operability A profile has a unique identity Defines vocabulary, guidelines, expansion guidance, serialisation format

Profile Compliance PROFILE Id Vocabulary Guidance Expansion directives Serialisation Profile Compliant Graph Profile-expanded Graph Profile Expansion

Inferred Graph 2 Profile Compliance Profile Compliant Graph Profile-expanded Graph Inferred Graph1 OPM Inference

Emerging Profiles – Collections – Dublin Core – D-Profile Will be discussed in separate session

OPM FORMALIZATIONS

Early Formalizations OPM v1.00 and OPMv1.01 contained a set- theoretic definition of OPM and permitted inferences Moved out of OPMv1.1 since it is difficult to keep specification and formalization in sync While the formalization is useful in defining OPM precisely, it does not give OPM a meaning!

Reproducibility Semantics (Moreau 2010) Sees OPM graph as an executable program: – Each process is associated with the name of an executable primitive – Primitive environment maps primitive names to primitives PrimitiveEnv = PrimitiveName  Primitive Primitive = P(RoleValue)  P(RoleValue) – Graph factories to create new artifacts, new processes …

Reproducibility Semantics (Moreau 2010) An execution of an OPM graph results in – A new OPM graph, describing re-execution – A mapping between nodes of the original graph and the resulting graph Execution proceeds by ordering processes (assumes acyclicity) and re-executing them, one by one; for each process executed, new process node and new output artifacts are created by factory

Reproducibility Semantics (Moreau 2010)

Temporal Semantics (Kwasnikowska, Moreau, Van den Bussche 2010) Timepoints – create(A): creation of artifact A – begin(P), end(P): beginning and end of process P – use(P,r,A): use of artifact A in role r, by process P Temporal theory Th(G) of a graph G is a set of inequalities: e.g., – begin(P)≤create(A) for any generated-by edge A  P – create(A)≤end(P) for any used edge P  A Temporal interpretation of G is a triple (T, , τ) A temporal interpretation satisfies u≤v if τ(u)  τ(v) A temporal model of G is a is a temporal interpretation that satisfies all inequalities from Th(G) Logical consequence G ⊨ u≤v if it is satisfied in every temporal model of G.

Temporal Semantics (Kwasnikowska, Moreau, Van den Bussche 2010) OPM Inference: G ⊢ A  P Why this set of inference rules? Characterization of OPM inference rules in the form of a soundness and completeness result Cases not involving use-timepoints – G ⊨ begin(P)≤create(A) iff G ⊢ A  P Cases involving use-timepoints – G ⊨ begin(P)≤use(Q,r,A) iff G ⊢ some pattern

Temporal Semantics (Kwasnikowska, Moreau, Van den Bussche 2010) Refinement of two OPM graphs Let us consider two OPM graphs G and H, For any timepoints u,v of both G and H, G is refined by H If G ⊨ u≤v then H ⊨ u≤v

Causality Semantics (Cheney 2010) Exploits Halpern and Pearl’s causal theory of explanation The semantics of an OPM graph is a causal function, mapping graph inputs to outputs Provenance semantics P f approximates locally a function f, if for any u 1, …, u n [[P f(u 1, …, u n )]] τ =f τ (u 1, …, u n ) for some intervention τ fixing some inputs of f

Workflow Semantics (Missier and Goble 2010) Two functions: – W2G: Workflow × Trace  OPM Graph – G2W: OPM Graph  Workflow Two properties: – Plausible workflow: W2G(G2W(g),T)=g – Lossless-ness: G2W(W2G(w,T))=w Define W2G and G2W for Taverna workflow language Introduce annotations to be able to reconstruct Taverna iterations In essence, provide a semantics for OPM by composing G2W and Taverna semantics

Workflow Semantics (Missier and Goble 2010)

Provenance Vocabulary Mappings (Sahoo et al 2010) OPM selected as the reference provenance model. First, because OPM is a general and broad model that encompasses many aspects of provenance. Second, it already represents a community effort that spans several years and is still ongoing, already benefiting from many discussions, practical use, and several versions. Finally, many groups are already undergoing efforts to map their vocabularies to OPM, and in addition there are already some mappings (called profiles in OPM) developed by the OPM group to some existing vocabularies.

Conclusions on OPM Semantics Four novel semantics of OPM published in 2010 Deal with different subsets of OPM Not all fully “compatible” with OPM v1.1 Grand theory of OPM is still an open problem

CONCLUSION AND OPEN ISSUES

Conclusions Over 14 teams have implemented the OPM specification for a successful inter-operability exercise PC3 Open source governance model for OPM OPM1.1 published and to be used in PC4 OPM consists of a common core found in many provenance vocabularies What beyond? – Define useful profiles – Finalize semantics

Open Issues (inter-operability) List of technical issues: agents, annotations, time, streamed data, collections, mutable objects How to express queries over OPM graphs? Security: attribution and non-repudiation API for recording and querying How to inter-operate in a distributed system?

Open Issues (research) Accounts Relations between accounts: refinement, overlap, alternate Reasoning with conflicting provenance Reasoning with incomplete provenance Can we formalise profiles?