An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton

Slides:



Advertisements
Similar presentations
L3S Research Center University of Hanover Germany
Advertisements

Enabling and Supporting Provenance in e-Science Applications Luc Moreau University of Southampton
A Proof of Concept: Provenance in a Service Oriented Architecture Liming Chen, Victor Tan, Fenglian Xu, Alexis Biller, Paul Groth, Simon Miles, John Ibbotson,
Brief Introduction to Provenance "As data becomes plentiful, verifiable truth becomes scarce
Service Description: WSDL COMP6017 Topics on Web Services Dr Nicholas Gibbins –
Open Provenance Model Tutorial Session 6: Interoperability.
Open Provenance Model Tutorial Session 2: OPM Overview and Semantics Luc Moreau University of Southampton.
Provenance: concepts, architecture and envisioned tools Professor Luc Moreau University of Southampton
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
Architecture Tutorial Summary and Conclusions. Architecture Tutorial The Provenance Architecture.
Provenance in Distr. Organ Transplant Management Applying Provenance in Distributed Organ Management Sergio Álvarez, Javier Vázquez-Salceda, Tamás Kifor,
PrIMe PrIMe : Provenance Incorporating Methodology Steve Munroe The EU Grid Provenance Project University of Southampton UK
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Practical Object-Oriented Design with UML 2e Slide 1/1 ©The McGraw-Hill Companies, 2004 PRACTICAL OBJECT-ORIENTED DESIGN WITH UML 2e Chapter 5: Restaurant.
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 8 Slide 1 System models.
Provenance Challenges and Technologies for Grids Luc Moreau University of Southampton
Adaptive Hypermedia Meets Provenance Evgeny Knutov Paul De Bra Mykola Pechenizkiy GAF project: Generic Adaptation Framework (project is supported byNWO.
Open Provenance Model Tutorial Session 5: OPM Emerging Profiles.
System Design Chapter 8. Objectives  Understand the verification and validation of the analysis models.  Understand the transition from analysis to.
Architecture Tutorial Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Electronically Querying for the Provenance of Entities Simon Miles Provenance-Aware Service-Oriented Architectures.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 7 Slide 1 System models l Abstract descriptions of systems whose requirements are being.
©Ian Sommerville 2000 Software Engineering, 6th edition. Chapter 6 Slide 1 Requirements Engineering Processes l Processes used to discover, analyse and.
UK e-Science All Hands Meeting 2005 Paul Groth, Simon Miles, Luc Moreau.
Cross-Enterprise User Assertion IHE Educational Workshop 2007 Cross-Enterprise User Assertion IHE Educational Workshop 2007 John F. Moehrke GE Healthcare.
OHT 11.1 © Marketing Insights Limited 2004 Chapter 9 Analysis and Design EC Security.
References: [1] [2] [3] Acknowledgments:
Usage of `provenance’: A Tower of Babel Luc Moreau.
Provenance Aware Service Oriented Architecture (1 year on) Professor Luc Moreau University of Southampton
Architecture Tutorial Provenance: overview Professor Luc Moreau University of Southampton
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Configuration Management (CM)
Provenance: an open approach to experiment validation in e- Science Professor Luc Moreau University of Southampton
Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering Nithya N. Vijayakumar, Beth Plale DDE Lab, Indiana University {nvijayak,
Copyright 2002 Prentice-Hall, Inc. Chapter 2 Object-Oriented Analysis and Design Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer Joey.
Chapter 7 System models.
A Flexible Access Control Model for Web Services Elisa Bertino CERIAS and CS Department, Purdue University Joint work with Anna C. Squicciarini – University.
Lecture 7: Requirements Engineering
Provenance: an open approach to experiment validation in e- Science Professor Luc Moreau University of Southampton
Provenance Challenge Simon Miles, Mike Wilde, Ian Foster and Luc Moreau.
Agent-Oriented Data Curation in Bioinformatics Simon Miles University of Southampton PASOA project:
Security Issues in a SOA- based Provenance System Victor Tan, Paul Groth, Simon Miles, Sheng Jiang, Steve Munroe, Sofia Tsasakou and Luc Moreau PASOA/EU.
Enterprise Integration Patterns CS3300 Fall 2015.
July 27, 2005High Performance Distributed Computing 05 Recording and Using Provenance in a Protein Compressibility Experiment Paul Groth, Simon Miles,
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
OPODIS'04 A protocol for recording provenance in service-oriented Grids Paul Groth, Michael Luck, Luc Moreau University of Southampton.
BPEL Business Process Engineering Language A technology used to build programs in SOA architecture.
Formalising a protocol for recording provenance in Grids Paul Groth – University of Southampton.
Recording the Context of Action for Process Documentation Ian Wootten Cardiff University, UK
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 8 Slide 1 System models.
Project Database Handler The Project Database Handler is a brokering application, which will mediate interactions between the project database and other.
Recording Actor Provenance in Scientific Workflows Ian Wootten, Shrija Rajbhandari, Omer Rana Cardiff University, UK.
UML - Development Process 1 Software Development Process Using UML.
CS223: Software Engineering
David Adams ATLAS Datasets for the Grid and for ATLAS David Adams BNL September 24, 2003 ATLAS Software Workshop Database Session CERN.
Architecture Tutorial 1 Overview of Today’s Talks Provenance Data Structures Recording and Querying Provenance –Break (30 minutes) Distribution and Scalability.
Provenance in Distr. Organ Transplant Management EU PROVENANCE project: an open provenance architecture for distributed.
Principles of High Quality Documentation for Provenance: A Philosophical Discussion Paul Groth, Simon Miles, Steve Munroe University of Southampton.
Tools for Navigating and Analysis of Provenance Information Vikas Deora, Arnaud Contes and Omer Rana.
FUNCTIONAL MODELING Alajas, Sophiya Ann Allego, Keefer Lloyd Maningo, Patrick Sage Pleños, John Enrick CPE 51ASATURDAY 7:30 – 10:30ENGR. ARNOLD ROSO.
Service Composition Orchestration BPEL Cédric Tedeschi ISI – M2R.
Recording and Reasoning Over Data Provenance in Web and Grid Services Martin Szomszor and Luc Moreau University of Southampton.
1 Team Skill 3 Defining the System Part 1: Use Case Modeling Noureddine Abbadeni Al-Ain University of Science and Technology College of Engineering and.
Provenance: an open approach to experiment validation in e-Science
Provenance: Problem, Architectural issues, Towards Trust
COIT20235 Business Process Modelling
CSSSPEC6 SOFTWARE DEVELOPMENT WITH QUALITY ASSURANCE
Software Development Process Using UML Recap
Presentation transcript:

An Open Provenance Model for Scientific Workflows Professor Luc Moreau University of Southampton

Provenance & PASOA Teams University of Southampton Luc Moreau, Paul Groth, Simon Miles, Victor Tan, Miguel Branco, Sofia Tsasakou, Sheng Jiang, Steve Munroe, Zheng Chen IBM UK (EU Project Coordinator) John Ibbotson, Neil Hardman, Alexis Biller University of Wales, Cardiff Omer Rana, Arnaud Contes, Vikas Deora, Ian Wootten, Shrija Rajbhandari Universitad Politecnica de Catalunya (UPC) Steven Willmott, Javier Vazquez SZTAKI Laszlo Varga, Arpad Andics, Tamas Kifor German Aerospace Andreas Schreiber, Guy Kloss, Frank Danneman

Contents Motivation Provenance Concept Map Process documentation in a concrete bioinformatics application Conclusions

Motivation

Peer Review/Audit Accounting Banking Healthcare Academic publishing

e-Science datasets How to undertake peer-reviewing and validation of e-Scientific results?

Current Solutions Proprietary, Monolithic Silos, Closed Do not inter-operate with other applications Not adaptable to new regulations

Provenance Oxford English Dictionary: the fact of coming from some particular source or quarter; origin, derivation the history or pedigree of a work of art, manuscript, rare book, etc.; concretely, a record of the passage of an item through its various owners. Concept vs representation

Application Drivers Aerospace engineering: maintain a historical record of design processes, up to 99 years. Organ transplant management: tracking of previous decisions, crucial to maximise the efficiency in matching and recovery rate of patients High Energy Physics: tracking, analysing, verifying data sets in the ATLAS Experiment of the Large Hadron Collider (CERN) Bioinformatics: verification and auditing of “experiments” (e.g. for drug approval)

Provenance Concept Map

is an execution of Application Services Provenance (concept) Data product produces Process Documentation P-structure has a structure operates over P-assertions consists of contains assert Process documents is defined as a past Provenance (representation) is represented by Provenance Query is obtained by has

Making Applications Provenance Aware Application Data Product Provenance Store Assert p-assertions and record them as Process Documentation Obtain the provenance of data by issuing provenance queries

Process Documentation M1 M2 M3 M4 f1 f2 M3 = f1(M1) M2 = f2(M1,M4) M2 is in reply to M1 I received M1, M4 I sent M2, M3 Interaction p-assertions Relationship p-assertions Service state p-assertions I received M1 at time t I used algorithm x.y.z

Data flow Interaction p-assertions allow us to specify a flow of data between services Relationship p-assertions allow us to characterise the flow of data “inside” an service Overall data flow (internal + external) constitutes a DAG, which characterises the process that led to a result

Process Documentation in a Concrete Bioinformatics Application

Biology Determine how protein sequences fold into a 3D structure? Structure of protein sequences may help to answer this question. Structure can be quantified by textual compressibility. Determine the amino acid groupings that maximize compressibility?

Collaboration Diagram

Actual Call DAG

The P-Structure The logical structure of a provenance store

Interaction Record The set of p-assertions pertaining to a given interaction (i.e., message exchange between a sender and a receiver)

Interaction Key A unique identifier for an interaction Sender identity Receiver identity Local id

View The set of p-assertions created by an asserter involved in an interaction (sender or receiver view)

Asserter The identity of an asserter

Interaction P-Assertion An assertion of the contents of a message by an actor that has sent or received that message

Interaction P-Assertion Content The content of an interaction p-assertion: here, the invocation of blast (through a wrapper)

Interaction Content Provenance-related information passed in application messages

Actor State P-Assertion An assertion made by an actor about its internal state in the context of a specific interaction

Relationship P-Assertion With respect to an interaction, a relationship p-assertion is an assertion, made by an actor, that describes how the actor obtained output data or the whole message sent in that interaction by applying some function to input data or messages from other interactions.

Subject Id The identity of the subject of a relationship

Object Id The identity of the object of a relationship

Process Documentation Characteristics Common logical structure of the provenance store shared by all asserting and querying actors Can be produced autonomously, asynchronously by the different application components Open, extensible model, for which we are producing a public specification Tools can operate on it (e.g. visualisation, reasoning)

Performance (HPDC’05)

Standardisation Philosophy Thin layer common between systems: extensible data model Model can be extended for specific: technologies (WS, Web, …), or application domains (Bio, Healthcare, Desktop, …) Service interfaces

WS-Prov-Intro WS-Prov-DM WS-Prov-Glo WS-Prov-RecWS-Prov-Query WS-Prov-DM-Link WS-Prov-DM-Infer WS-Prov-DM-DS Generic ProfilesDomain Specific Profiles WS-Prov-SOAP Technology Bindings WS-Prov-DM-Sec WS-Prov-WWW WS-Prov-DM-Rel WS-Prov-Primer Proposed List of Specifications

Conclusions

Provenance Store Record To Sum Up Query Compliance check Rerun/Reproduce Analyse Standardising the documentation of Business Processes Provenance Architecture Methodology Apply Healthcare Distribution Finance Aerospace Automobile Pharmaceutical Slide from John Ibbotson

Conclusions Crucial topic for many applications Full architectural specification Implementation available for download Methodology to make application provenance-aware Draft standardisation proposal to be released

twiki.ipaw.info Provenance Challenge Provenance Challenge Workshop at OGF18, Washington, September 11-14

Questions